GitHub user jkff opened a pull request:

    https://github.com/apache/beam/pull/3799

    [BEAM-2828, BEAM-2750] Introduces FileIO.read() and uses it in Text, Avro 
and Xml

    This is on top of https://github.com/apache/beam/pull/3759.
    
    * Creates a `ReadableFile` type that's just a utility wrapper over a 
`Metadata` and `Compression.
    * Creates `FileIO.read()` that returns `ReadableFile`'s - this subsumes 
BEAM-2750.
    * Creates versions `readFiles()` in TextIO and XmlIO that give access to 
all the features of `FileIO` to all users of these IOs. For example, XmlIO does 
not explicitly support watching for new files, or value providers - but you can 
get them by combining `FileIO.match`, `FileIO.read`, and `XmlIO.readFiles`.
    
        /**
         * Like {@link #read}, but reads each file in a {@link PCollection} of 
{@link ReadableFile}, which
         * allows more flexible usage via different configuration options of 
{@link FileIO#match} and
         * {@link FileIO#readMatches} that are not explicitly provided for 
{@link #read}.
         *
         * <p>For example:
         *
         * <pre>{@code
         * PCollection<ReadableFile> files = p
         *     
.apply(FileIO.match().filepattern(options.getInputFilepatternProvider()).continuously(
         *       Duration.standardSeconds(30), 
afterTimeSinceNewOutput(Duration.standardMinutes(5))))
         *     .apply(FileIO.readMatches().withCompression(GZIP));
         *
         * PCollection<String> output = files.apply(XmlIO.<Record>readFiles()
         *     .withRootElement("root")
         *     .withRecordElement("record")
         *     .withRecordClass(Record.class));
         * }</pre>
         */
    
    R: @reuvenlax

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/jkff/incubator-beam readable-file

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/beam/pull/3799.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #3799
    
----
commit a9e3e82cadb5e92b430c632ca1503a45eaa2da6d
Author: Eugene Kirpichov <[email protected]>
Date:   2017-08-24T23:31:41Z

    Moves Match into FileIO.match()/matchAll()
    
    FileIO will later gain other methods, such as read()/write().
    
    Also introduces FileIO.MatchConfiguration - a common type to use
    by various file-based IOs to reduce boilerplate, and uses it in TextIO.

commit ead6af9bf3e2bbe2f53d4e75a4bf0e1a40a92b31
Author: Eugene Kirpichov <[email protected]>
Date:   2017-08-31T23:11:25Z

    Introduces FileIO.read()

commit b36b679682e8dcfa6106083069d83a874fac7228
Author: Eugene Kirpichov <[email protected]>
Date:   2017-08-31T23:28:07Z

    Uses FileIO.read() in TextIO and AvroIO

commit c38482c102871399eb50551a45b7b79ab8e8fc6e
Author: Eugene Kirpichov <[email protected]>
Date:   2017-08-31T23:43:22Z

    Introduces TextIO.readFiles()

commit 3b17c41730a43557d000b6c4662e6572d97fdcd7
Author: Eugene Kirpichov <[email protected]>
Date:   2017-09-01T00:21:20Z

    Introduces XmlIO.readFiles

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

Reply via email to