[ 
https://issues.apache.org/jira/browse/CRUNCH-128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13533514#comment-13533514
 ] 

Gabriel Reid commented on CRUNCH-128:
-------------------------------------

About the additional methods on the interface: ok, good point, I hadn't even 
come close to thinking of that angle. Another option I just thought of to keep 
the number of parallelDo methods down is to just add the varargs to the current 
parallelDo method instead of overloading it -- that would remain backwards 
compatible, and then we'd still only have four forms of parallelDo.

On the one hand, it could definitely be confusing to have a varargs argument 
there that nobody is ever going to use, while on the other hand, having six 
versions of parallelDo is also probably confusing (or maybe I'm just making a 
bigger deal out of it). I think my biggest issue with it is just around code 
completion in an IDE, but I'm also trying to think of what is going to be the 
most simple to understand if you're looking at API docs.
                
> Allow one stage of an MR pipeline to depend on another target being created
> ---------------------------------------------------------------------------
>
>                 Key: CRUNCH-128
>                 URL: https://issues.apache.org/jira/browse/CRUNCH-128
>             Project: Crunch
>          Issue Type: Improvement
>            Reporter: Josh Wills
>         Attachments: CheckpointingIT.java, CRUNCH-128.patch, 
> CRUNCH-128v2.patch
>
>
> There are a couple of problems (e.g., mapside-joins, total orderings, etc.) 
> where we need to guarantee that one PCollection has been written to the 
> FileSystem before another MapReduce pipeline that depends on that file is 
> allowed to run. This doesn't fit cleanly into the current set of abstractions 
> for Crunch, which is why we force pipelines to execute via the run command to 
> guarantee that the files have been created before the second stage is run.
> We should add the ability for a particular PCollection to require that a 
> SourceTarget instance has been created before it can be executed, and the 
> planner should incorporate this information into the MR pipeline planning 
> process.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to