Dave Beech created CRUNCH-218:
---------------------------------
Summary: Add new Target.WriteMode to skip the write and continue
pipeline if an output target exists
Key: CRUNCH-218
URL: https://issues.apache.org/jira/browse/CRUNCH-218
Project: Crunch
Issue Type: Improvement
Components: Core
Affects Versions: 0.6.0
Reporter: Dave Beech
Assignee: Josh Wills
Priority: Minor
Quite often I write pipelines which persist data to the filesystem midway
through the process, and then carry on doing further work.
If this intermediate data is already present, I think it would be good if I
could set a write mode which skips over this first half of processing. This way
I'd avoid running jobs unnecessarily and wasting cluster resources regenerating
data I already have.
Example:
PCollection<B> inter =
pipeline.read(source).parallelDo(something).parallelDo(somethingElse);
inter.write(At.sequenceFile('output'), WriteMode.SKIP_IF_EXISTS);
PCollection<C> final = inter.parallelDo(moreWork);
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira