[ 
https://issues.apache.org/jira/browse/CRUNCH-405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14065307#comment-14065307
 ] 

Josh Wills commented on CRUNCH-405:
-----------------------------------

Would it be better to move the info that a PCollectionImpl has been 
materialized to the DistributedPipeline impl? The Executor could hold a 
reference to the Pipeline instance (already true, right?) and then only modify 
the Pipeline state once it was called. We might need some sync logic in there 
to make sure two identical plans weren't executed simultaneously-- there would 
need to be a way for the execution of one plan to invalidate the execution of 
any others that were created.

> Explore adding support for idempotent MRPipeline.plan()
> -------------------------------------------------------
>
>                 Key: CRUNCH-405
>                 URL: https://issues.apache.org/jira/browse/CRUNCH-405
>             Project: Crunch
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Micah Whitacre
>            Assignee: Micah Whitacre
>         Attachments: CRUNCH-405_v1.patch
>
>
> Talking through a use case with a consumer, they were interested in having 
> the ability to run the MRPipeline.plan() method one to many times prior to 
> ever calling the Pipeline.run/done methods.  The reason for this was they 
> were looking at pulling information off the MRExecutor to tweak settings 
> inside of their DoFns.
> Currently the MRPipeline implementation however does not have an idempotent 
> plan() method as it alters the state of internal values therefore affecting 
> the full run once done() is called.  
> It would be nice if we added an idempotent plan() method that could be gather 
> this information or perhaps a reset option.  



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to