[
https://issues.apache.org/jira/browse/MAPREDUCE-5079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13608350#comment-13608350
]
Jason Lowe commented on MAPREDUCE-5079:
---------------------------------------
Thanks for taking a look, Bobby. I'll address the missing {{private}}
directive when I update the patch with more unit tests.
Speaking of testing, I did run the test suite under hadoop-mapreduce-client/
and all the tests passed. I also manually tested this by running sleep and
wordcount jobs, kill -9 the MRAppMaster process while the job was running, then
watch the logs of the second attempt as it recovered. I covered the cases of
normal map/reduce task success, a case of speculative attempt, and a map with a
fetch failure that was re-run.
> Recovery should restore task state from job history info directly
> -----------------------------------------------------------------
>
> Key: MAPREDUCE-5079
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5079
> Project: Hadoop Map/Reduce
> Issue Type: Improvement
> Components: mr-am
> Affects Versions: 0.23.7
> Reporter: Jason Lowe
> Assignee: Jason Lowe
> Attachments: MAPREDUCE-5079.patch
>
>
> We've encountered a lot of hanging issues during MR-AM recovery because the
> state machines don't always end up in the same states after recovery. This
> is especially true when speculative execution is enabled. It should be
> straightforward to restore task and task attempt states directly from the
> TaskInfo and TaskAttemptInfo records in the job history file to avoid relying
> on the task state machines ending up in the proper states with the proper
> number of attempts.
> This should be a more robust solution that would also give us the option of
> recovering start time and log locations for tasks that were in-progress when
> the AM crashed.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira