Jason Lowe created MAPREDUCE-5079:
-------------------------------------
Summary: Recovery should restore task state from job history info
directly
Key: MAPREDUCE-5079
URL: https://issues.apache.org/jira/browse/MAPREDUCE-5079
Project: Hadoop Map/Reduce
Issue Type: Improvement
Components: mr-am
Affects Versions: 0.23.7
Reporter: Jason Lowe
Assignee: Jason Lowe
We've encountered a lot of hanging issues during MR-AM recovery because the
state machines don't always end up in the same states after recovery. This is
especially true when speculative execution is enabled. It should be
straightforward to restore task and task attempt states directly from the
TaskInfo and TaskAttemptInfo records in the job history file to avoid relying
on the task state machines ending up in the proper states with the proper
number of attempts.
This should be a more robust solution that would also give us the option of
recovering start time and log locations for tasks that were in-progress when
the AM crashed.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira