[
https://issues.apache.org/jira/browse/BEAM-2828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16149850#comment-16149850
]
ASF GitHub Bot commented on BEAM-2828:
--------------------------------------
GitHub user jkff opened a pull request:
https://github.com/apache/beam/pull/3799
[BEAM-2828, BEAM-2750] Introduces FileIO.read() and uses it in Text, Avro
and Xml
This is on top of https://github.com/apache/beam/pull/3759.
* Creates a `ReadableFile` type that's just a utility wrapper over a
`Metadata` and `Compression.
* Creates `FileIO.read()` that returns `ReadableFile`'s - this subsumes
BEAM-2750.
* Creates versions `readFiles()` in TextIO and XmlIO that give access to
all the features of `FileIO` to all users of these IOs. For example, XmlIO does
not explicitly support watching for new files, or value providers - but you can
get them by combining `FileIO.match`, `FileIO.read`, and `XmlIO.readFiles`.
/**
* Like {@link #read}, but reads each file in a {@link PCollection} of
{@link ReadableFile}, which
* allows more flexible usage via different configuration options of
{@link FileIO#match} and
* {@link FileIO#readMatches} that are not explicitly provided for
{@link #read}.
*
* <p>For example:
*
* <pre>{@code
* PCollection<ReadableFile> files = p
*
.apply(FileIO.match().filepattern(options.getInputFilepatternProvider()).continuously(
* Duration.standardSeconds(30),
afterTimeSinceNewOutput(Duration.standardMinutes(5))))
* .apply(FileIO.readMatches().withCompression(GZIP));
*
* PCollection<String> output = files.apply(XmlIO.<Record>readFiles()
* .withRootElement("root")
* .withRecordElement("record")
* .withRecordClass(Record.class));
* }</pre>
*/
R: @reuvenlax
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/jkff/incubator-beam readable-file
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/beam/pull/3799.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #3799
----
commit a9e3e82cadb5e92b430c632ca1503a45eaa2da6d
Author: Eugene Kirpichov <[email protected]>
Date: 2017-08-24T23:31:41Z
Moves Match into FileIO.match()/matchAll()
FileIO will later gain other methods, such as read()/write().
Also introduces FileIO.MatchConfiguration - a common type to use
by various file-based IOs to reduce boilerplate, and uses it in TextIO.
commit ead6af9bf3e2bbe2f53d4e75a4bf0e1a40a92b31
Author: Eugene Kirpichov <[email protected]>
Date: 2017-08-31T23:11:25Z
Introduces FileIO.read()
commit b36b679682e8dcfa6106083069d83a874fac7228
Author: Eugene Kirpichov <[email protected]>
Date: 2017-08-31T23:28:07Z
Uses FileIO.read() in TextIO and AvroIO
commit c38482c102871399eb50551a45b7b79ab8e8fc6e
Author: Eugene Kirpichov <[email protected]>
Date: 2017-08-31T23:43:22Z
Introduces TextIO.readFiles()
commit 3b17c41730a43557d000b6c4662e6572d97fdcd7
Author: Eugene Kirpichov <[email protected]>
Date: 2017-09-01T00:21:20Z
Introduces XmlIO.readFiles
----
> Create FileIO
> -------------
>
> Key: BEAM-2828
> URL: https://issues.apache.org/jira/browse/BEAM-2828
> Project: Beam
> Issue Type: New Feature
> Components: sdk-java-core
> Reporter: Eugene Kirpichov
> Assignee: Eugene Kirpichov
> Fix For: 2.2.0
>
>
> Let's have FileIO as a namespace for transforms such as: current
> Match.filepatterns(); FileIO.read() for reading whole files and
> FileIO.write() for writing whole files, etc.
> Target for 2.2.0 is just creating the namespace and moving
> Match.filepatterns() into it (https://github.com/apache/beam/pull/3759).
> Related JIRAs: https://issues.apache.org/jira/browse/BEAM-2750 and
> https://issues.apache.org/jira/browse/BEAM-2751
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)