[ 
https://issues.apache.org/jira/browse/BEAM-2750?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Christopher Hebert updated BEAM-2750:
-------------------------------------
    Description: 
I'd like to read whole files as one element each.

If my input files are hi.txt, what.txt, and yes.txt, then the whole contents of 
hi.txt are an element of the returned PCollection, the whole contents of 
what.txt are the next element, etc., giving me a PCollection with three 
elements.

This contrasts with TextIO which reads a new element for every line of text in 
the input files.

This read (I'll call it WholeFileIO for now) would work like so:

{code:java}
PCollection<KV<String, Byte[]>> fileNamesAndBytes = p.apply("Read", 
WholeFileIO.read().from("/path/to/input/dir/*"));
{code}

This ticket complements [BEAM-2751].


  was:
I'd like to read whole files as one input each.

If my input files are hi.txt, what.txt, and yes.txt, then the whole contents of 
hi.txt are an element of the returned PCollection, the whole contents of 
what.txt are the next element, etc., giving me a PCollection with three 
elements.

This contrasts with TextIO which reads a new element for every line of text in 
the input files.

This read (I'll call it WholeFileIO for now) would work like so:

{code:java}
PCollection<KV<String, Byte[]>> fileNamesAndBytes = p.apply("Read", 
WholeFileIO.read().from("/path/to/input/dir/*"));
{code}

This ticket complements [BEAM-2751].



> Read whole files as one PCollection element each
> ------------------------------------------------
>
>                 Key: BEAM-2750
>                 URL: https://issues.apache.org/jira/browse/BEAM-2750
>             Project: Beam
>          Issue Type: New Feature
>          Components: sdk-java-core
>            Reporter: Christopher Hebert
>            Assignee: Davor Bonaci
>
> I'd like to read whole files as one element each.
> If my input files are hi.txt, what.txt, and yes.txt, then the whole contents 
> of hi.txt are an element of the returned PCollection, the whole contents of 
> what.txt are the next element, etc., giving me a PCollection with three 
> elements.
> This contrasts with TextIO which reads a new element for every line of text 
> in the input files.
> This read (I'll call it WholeFileIO for now) would work like so:
> {code:java}
> PCollection<KV<String, Byte[]>> fileNamesAndBytes = p.apply("Read", 
> WholeFileIO.read().from("/path/to/input/dir/*"));
> {code}
> This ticket complements [BEAM-2751].



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to