[
https://issues.apache.org/jira/browse/BEAM-2750?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Christopher Hebert updated BEAM-2750:
-------------------------------------
Summary: Read whole files as one PCollection element each (was: Read whole
files as one element each)
> Read whole files as one PCollection element each
> ------------------------------------------------
>
> Key: BEAM-2750
> URL: https://issues.apache.org/jira/browse/BEAM-2750
> Project: Beam
> Issue Type: New Feature
> Components: sdk-java-core
> Reporter: Christopher Hebert
> Assignee: Davor Bonaci
>
> I'd like to read whole files as one input each.
> If my input files are hi.txt, what.txt, and yes.txt, then the whole contents
> of hi.txt are an element of the returned PCollection, the whole contents of
> what.txt are the next element, etc., giving me a PCollection with three
> elements.
> This contrasts with TextIO which reads a new element for every line of text
> in the input files.
> This read (I'll call it WholeFileIO for now) would work like so:
> {code:java}
> PCollection<KV<String, Byte[]>> fileNamesAndBytes = p.apply("Read",
> WholeFileIO.read().from("/path/to/input/dir/*"));
> {code}
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)