[jira] Commented: (MAPREDUCE-2171) job recovery mechanism

Kang Xiao (JIRA) Tue, 02 Nov 2010 02:36:56 -0700

    [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12927341#action_12927341
 ]


Kang Xiao commented on MAPREDUCE-2171:
--------------------------------------

The job recovery mechanism is targeted to solve three kinds of problem:

# If a long running job fails, it  has to be re-submitted as a total new job 
and all tasks including succeededones have to be re-executed
# If we update a cluster to a new hadoop version, all running jobs need to 
re-run.
# If we restart a tasktracker, all running tasks and succeededmaps need to be 
re-executed.

RecoveryManager of JobTracker solves some part of problem 2. However it just 
automatically re-run all running jobs, all succeededtasks still need to be 
re-executed.

> job recovery mechanism
> ----------------------
>
>                 Key: MAPREDUCE-2171
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2171
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: jobtracker, tasktracker
>            Reporter: Kang Xiao
>
> A job recovery mechanism to enable a job to re-execute only failed task upon 
> job failed or jobtracker/tasktracker restart.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-2171) job recovery mechanism

Reply via email to