Davor Bonaci reassigned BEAM-2750:

    Assignee: Christopher Hebert  (was: Davor Bonaci)

> Read whole files as one PCollection element each
> ------------------------------------------------
>                 Key: BEAM-2750
>                 URL: https://issues.apache.org/jira/browse/BEAM-2750
>             Project: Beam
>          Issue Type: New Feature
>          Components: sdk-java-core
>            Reporter: Christopher Hebert
>            Assignee: Christopher Hebert
> I'd like to read whole files as one element each.
> If my input files are hi.txt, what.txt, and yes.txt, then the whole contents 
> of hi.txt are an element of the returned PCollection, the whole contents of 
> what.txt are the next element, etc., giving me a PCollection with three 
> elements.
> This contrasts with TextIO which reads a new element for every line of text 
> in the input files.
> This read (I'll call it WholeFileIO for now) would work like so:
> {code:java}
> PCollection<KV<String, Byte[]>> fileNamesAndBytes = p.apply("Read", 
> WholeFileIO.read().from("/path/to/input/dir/*"));
> {code}
> The above example passes the raw file contents and the filename.
> Alternatively, we could pass a PCollection of some sort of FileWrapper around 
> an InputStream to support lazy loading.
> This ticket complements [BEAM-2751].

This message was sent by Atlassian JIRA

Reply via email to