[ 
https://issues.apache.org/jira/browse/CRUNCH-405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14095773#comment-14095773
 ] 

Allan Shoup commented on CRUNCH-405:
------------------------------------

I'm just starting to look into these parts of the code, so forgive my naiveté. 
It is not intuitive to me what the dryRun mode does or why it is needed.

Without knowing the how the current code might make this difficult, here's a 
stab at what might be a more intuitive structure. The plan method would return 
a Plan object, which would be tied to the state of the system when the plan was 
generated. You would then pass a plan object to the executor, which would then 
execute the plan (and manipulate any system state needed). If a plan was 
generated and before that plan was executed the system state was modified (via 
some parallelDo or write), the system state would be updated and the previously 
generated plan would no longer be executable.

So, given that there was a lot of hand-waving there, and the current setup will 
probably not be amenable to something like that, perhaps some javadoc would 
help clarify how the system is expected to function.

> Explore adding support for idempotent MRPipeline.plan()
> -------------------------------------------------------
>
>                 Key: CRUNCH-405
>                 URL: https://issues.apache.org/jira/browse/CRUNCH-405
>             Project: Crunch
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Micah Whitacre
>            Assignee: Micah Whitacre
>         Attachments: CRUNCH-405.patch, CRUNCH-405_v1.patch, 
> CRUNCH-405b.patch, CRUNCH-405c.patch
>
>
> Talking through a use case with a consumer, they were interested in having 
> the ability to run the MRPipeline.plan() method one to many times prior to 
> ever calling the Pipeline.run/done methods.  The reason for this was they 
> were looking at pulling information off the MRExecutor to tweak settings 
> inside of their DoFns.
> Currently the MRPipeline implementation however does not have an idempotent 
> plan() method as it alters the state of internal values therefore affecting 
> the full run once done() is called.  
> It would be nice if we added an idempotent plan() method that could be gather 
> this information or perhaps a reset option.  



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to