[ https://issues.apache.org/jira/browse/CRUNCH-405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14087109#comment-14087109 ]
Josh Wills commented on CRUNCH-405: ----------------------------------- Hey Micah-- I'm fine w/the changes, but the test failure is harder for me to deal with; I don't get it locally, which makes me think it's a timing problem w/respect to when the post-processing thread is run, which isn't great. I don't have a great way to fix it w/o passing access to the MRPipeline object all the way down into the MRExecutor so that it can tell the pipeline to release its resources as soon as the MR jobs finish executing. Any ideas? > Explore adding support for idempotent MRPipeline.plan() > ------------------------------------------------------- > > Key: CRUNCH-405 > URL: https://issues.apache.org/jira/browse/CRUNCH-405 > Project: Crunch > Issue Type: Improvement > Components: Core > Reporter: Micah Whitacre > Assignee: Micah Whitacre > Attachments: CRUNCH-405.patch, CRUNCH-405_v1.patch > > > Talking through a use case with a consumer, they were interested in having > the ability to run the MRPipeline.plan() method one to many times prior to > ever calling the Pipeline.run/done methods. The reason for this was they > were looking at pulling information off the MRExecutor to tweak settings > inside of their DoFns. > Currently the MRPipeline implementation however does not have an idempotent > plan() method as it alters the state of internal values therefore affecting > the full run once done() is called. > It would be nice if we added an idempotent plan() method that could be gather > this information or perhaps a reset option. -- This message was sent by Atlassian JIRA (v6.2#6252)