GitHub user jkff opened a pull request:
https://github.com/apache/beam/pull/3799
[BEAM-2828, BEAM-2750] Introduces FileIO.read() and uses it in Text, Avro
and Xml
This is on top of https://github.com/apache/beam/pull/3759.
* Creates a `ReadableFile` type that's just a utility wrapper over a
`Metadata` and `Compression.
* Creates `FileIO.read()` that returns `ReadableFile`'s - this subsumes
BEAM-2750.
* Creates versions `readFiles()` in TextIO and XmlIO that give access to
all the features of `FileIO` to all users of these IOs. For example, XmlIO does
not explicitly support watching for new files, or value providers - but you can
get them by combining `FileIO.match`, `FileIO.read`, and `XmlIO.readFiles`.
/**
* Like {@link #read}, but reads each file in a {@link PCollection} of
{@link ReadableFile}, which
* allows more flexible usage via different configuration options of
{@link FileIO#match} and
* {@link FileIO#readMatches} that are not explicitly provided for
{@link #read}.
*
* <p>For example:
*
* <pre>{@code
* PCollection<ReadableFile> files = p
*
.apply(FileIO.match().filepattern(options.getInputFilepatternProvider()).continuously(
* Duration.standardSeconds(30),
afterTimeSinceNewOutput(Duration.standardMinutes(5))))
* .apply(FileIO.readMatches().withCompression(GZIP));
*
* PCollection<String> output = files.apply(XmlIO.<Record>readFiles()
* .withRootElement("root")
* .withRecordElement("record")
* .withRecordClass(Record.class));
* }</pre>
*/
R: @reuvenlax
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/jkff/incubator-beam readable-file
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/beam/pull/3799.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #3799
----
commit a9e3e82cadb5e92b430c632ca1503a45eaa2da6d
Author: Eugene Kirpichov <[email protected]>
Date: 2017-08-24T23:31:41Z
Moves Match into FileIO.match()/matchAll()
FileIO will later gain other methods, such as read()/write().
Also introduces FileIO.MatchConfiguration - a common type to use
by various file-based IOs to reduce boilerplate, and uses it in TextIO.
commit ead6af9bf3e2bbe2f53d4e75a4bf0e1a40a92b31
Author: Eugene Kirpichov <[email protected]>
Date: 2017-08-31T23:11:25Z
Introduces FileIO.read()
commit b36b679682e8dcfa6106083069d83a874fac7228
Author: Eugene Kirpichov <[email protected]>
Date: 2017-08-31T23:28:07Z
Uses FileIO.read() in TextIO and AvroIO
commit c38482c102871399eb50551a45b7b79ab8e8fc6e
Author: Eugene Kirpichov <[email protected]>
Date: 2017-08-31T23:43:22Z
Introduces TextIO.readFiles()
commit 3b17c41730a43557d000b6c4662e6572d97fdcd7
Author: Eugene Kirpichov <[email protected]>
Date: 2017-09-01T00:21:20Z
Introduces XmlIO.readFiles
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---