[ 
https://issues.apache.org/jira/browse/CRUNCH-128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13534065#comment-13534065
 ] 

Josh Wills commented on CRUNCH-128:
-----------------------------------

Thanks Matthias. For the record, I am neutral on having the ParallelDoOperation 
object vs. the regular pDo methods with new signatures (with the caveat that we 
need a better name than "advancedParallelDo"). The virtue of 
ParallelDoOperation is the protection it provides against pDo spiraling out of 
control. Thinking about this now, I think we're going to have a variation on 
this that incorporates some number of PObjects as potential dependencies as 
well.

I'm +1 for moving CrunchRuntimeException to o.a.c., and I'm also +1 for 
removing sample() and sort() from the PCollection interface, although that one 
should be a different JIRA.
                
> Allow one stage of an MR pipeline to depend on another target being created
> ---------------------------------------------------------------------------
>
>                 Key: CRUNCH-128
>                 URL: https://issues.apache.org/jira/browse/CRUNCH-128
>             Project: Crunch
>          Issue Type: Improvement
>            Reporter: Josh Wills
>         Attachments: CheckpointingIT.java, CRUNCH-128.patch, 
> CRUNCH-128v2.patch, CRUNCH-128-with-op.patch
>
>
> There are a couple of problems (e.g., mapside-joins, total orderings, etc.) 
> where we need to guarantee that one PCollection has been written to the 
> FileSystem before another MapReduce pipeline that depends on that file is 
> allowed to run. This doesn't fit cleanly into the current set of abstractions 
> for Crunch, which is why we force pipelines to execute via the run command to 
> guarantee that the files have been created before the second stage is run.
> We should add the ability for a particular PCollection to require that a 
> SourceTarget instance has been created before it can be executed, and the 
> planner should incorporate this information into the MR pipeline planning 
> process.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to