[ 
https://issues.apache.org/jira/browse/CRUNCH-405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14087086#comment-14087086
 ] 

Micah Whitacre commented on CRUNCH-405:
---------------------------------------

Thanks for taking a stab at this things have been a bit busy...

* In MRPipeline we should probably mark "currentExecutor" as transient to 
ensure the main thread sees the state change when currentExecutor gets set back 
to null. (not related to these changes necessarily)
* In the "testPlanDryRunTrue" test we call cleanup after the dry run.  You can 
remove it and the tests still pass.  Ideally we shouldn't require that so 
removing it will help prevent us from reintroducing it.  Or should we move the 
cleanup to the executor like I mentioned above?
* Should pipeline.cleanup(true) reset the currentExecutor?

Running your patch on master gives the following.
{code}
Tests in error:
  
unionWriteShouldNotThrowNPE[2](org.apache.crunch.impl.dist.collect.UnionCollectionIT):
 Cannot plan another executable MapReduce job until the existing MRExecutor has 
been run
{code}

> Explore adding support for idempotent MRPipeline.plan()
> -------------------------------------------------------
>
>                 Key: CRUNCH-405
>                 URL: https://issues.apache.org/jira/browse/CRUNCH-405
>             Project: Crunch
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Micah Whitacre
>            Assignee: Micah Whitacre
>         Attachments: CRUNCH-405.patch, CRUNCH-405_v1.patch
>
>
> Talking through a use case with a consumer, they were interested in having 
> the ability to run the MRPipeline.plan() method one to many times prior to 
> ever calling the Pipeline.run/done methods.  The reason for this was they 
> were looking at pulling information off the MRExecutor to tweak settings 
> inside of their DoFns.
> Currently the MRPipeline implementation however does not have an idempotent 
> plan() method as it alters the state of internal values therefore affecting 
> the full run once done() is called.  
> It would be nice if we added an idempotent plan() method that could be gather 
> this information or perhaps a reset option.  



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to