[
https://issues.apache.org/jira/browse/CRUNCH-128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13534065#comment-13534065
]
Josh Wills commented on CRUNCH-128:
-----------------------------------
Thanks Matthias. For the record, I am neutral on having the ParallelDoOperation
object vs. the regular pDo methods with new signatures (with the caveat that we
need a better name than "advancedParallelDo"). The virtue of
ParallelDoOperation is the protection it provides against pDo spiraling out of
control. Thinking about this now, I think we're going to have a variation on
this that incorporates some number of PObjects as potential dependencies as
well.
I'm +1 for moving CrunchRuntimeException to o.a.c., and I'm also +1 for
removing sample() and sort() from the PCollection interface, although that one
should be a different JIRA.
> Allow one stage of an MR pipeline to depend on another target being created
> ---------------------------------------------------------------------------
>
> Key: CRUNCH-128
> URL: https://issues.apache.org/jira/browse/CRUNCH-128
> Project: Crunch
> Issue Type: Improvement
> Reporter: Josh Wills
> Attachments: CheckpointingIT.java, CRUNCH-128.patch,
> CRUNCH-128v2.patch, CRUNCH-128-with-op.patch
>
>
> There are a couple of problems (e.g., mapside-joins, total orderings, etc.)
> where we need to guarantee that one PCollection has been written to the
> FileSystem before another MapReduce pipeline that depends on that file is
> allowed to run. This doesn't fit cleanly into the current set of abstractions
> for Crunch, which is why we force pipelines to execute via the run command to
> guarantee that the files have been created before the second stage is run.
> We should add the ability for a particular PCollection to require that a
> SourceTarget instance has been created before it can be executed, and the
> planner should incorporate this information into the MR pipeline planning
> process.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira