Christopher Hebert created BEAM-2750:
----------------------------------------
Summary: Read whole files as one element each
Key: BEAM-2750
URL: https://issues.apache.org/jira/browse/BEAM-2750
Project: Beam
Issue Type: New Feature
Components: sdk-java-core
Reporter: Christopher Hebert
Assignee: Davor Bonaci
I'd like to read whole files as one input each.
If my input files are hi.txt, what.txt, and yes.txt, then the whole contents of
hi.txt are an element of the returned PCollection, the whole contents of
what.txt are the next element, etc., giving me a PCollection with three
elements.
This contrasts with TextIO which reads a new element for every line of text in
the input files.
This read (I'll call it WholeFileIO for now) would work like so:
{code:java}
PCollection<KV<String, Byte[]>> fileNamesAndBytes = p.apply("Read",
WholeFileIO.read().from("/path/to/input/dir/*"));
{code}
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)