[
https://issues.apache.org/jira/browse/MAPREDUCE-2171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12927341#action_12927341
]
Kang Xiao commented on MAPREDUCE-2171:
--------------------------------------
The job recovery mechanism is targeted to solve three kinds of problem:
# If a long running job fails, it has to be re-submitted as a total new job
and all tasks including succeededones have to be re-executed
# If we update a cluster to a new hadoop version, all running jobs need to
re-run.
# If we restart a tasktracker, all running tasks and succeededmaps need to be
re-executed.
RecoveryManager of JobTracker solves some part of problem 2. However it just
automatically re-run all running jobs, all succeededtasks still need to be
re-executed.
> job recovery mechanism
> ----------------------
>
> Key: MAPREDUCE-2171
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2171
> Project: Hadoop Map/Reduce
> Issue Type: Improvement
> Components: jobtracker, tasktracker
> Reporter: Kang Xiao
>
> A job recovery mechanism to enable a job to re-execute only failed task upon
> job failed or jobtracker/tasktracker restart.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.