Christopher Hebert created BEAM-2751:
----------------------------------------
Summary: Write PCollection elements to individual files
Key: BEAM-2751
URL: https://issues.apache.org/jira/browse/BEAM-2751
Project: Beam
Issue Type: New Feature
Components: sdk-java-core
Reporter: Christopher Hebert
Assignee: Davor Bonaci
I'd like to write elements as individual files.
Rather than smashing thousands of outputs into a handful of files as TextIO
does (output-00000-of-00005, output-00001-of-00005,...), I want to write each
element into unique files.
So if I used WholeFileIO from [BEAM-2750] to read in three files (hi.txt,
what.txt, and yes.txt) then I'd like to write the processed files out to
individual files with user or data-defined filenames (like hi-modified.txt,
what-modified.txt, and yes-modified.txt).
With a WholeFileIO, this would look like:
{code:java}
PCollection<KV<String, Byte[]>> fileNamesAndBytes = p.apply("Read",
WholeFileIO.read().from("/path/to/input/dir/*"));
...
// Do stuff that change contents and file names
...
modifedFileNamesAndBytes.apply("Write",
WholeFileIO.write().to("/path/to/output/dir/"));
{code}
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)