[ https://issues.apache.org/jira/browse/MAPREDUCE-873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12749849#action_12749849 ]
Hudson commented on MAPREDUCE-873: ---------------------------------- Integrated in Hadoop-Mapreduce-trunk-Commit #9 (See [http://hudson.zones.apache.org/hudson/job/Hadoop-Mapreduce-trunk-Commit/9/]) . Moving the CHANGES.txt comment to Incompatible section. . Simplify job recovery. InComplete jobs are resubmitted on jobtracker restart. Contributed by Sharad Agarwal. > Simplify Job Recovery > --------------------- > > Key: MAPREDUCE-873 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-873 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: jobtracker > Affects Versions: 0.20.1 > Reporter: Devaraj Das > Assignee: Sharad Agarwal > Fix For: 0.21.0 > > Attachments: 873_v1.patch, 873_v2.patch, 873_v3.patch > > > On a couple of occasions we have seen the JobTracker not being able to handle > job recovery well, and leading to cluster downtime after a restart. The > current design for handling job recovery is complex and prone to corner cases > not being handled well enough. In retrospect, it seems like the transaction > log based approach as was proposed on HADOOP-3245 > (http://tinyurl.com/luh9hb), would have been a better/simpler model. However, > that is a big project, and it seems for the medium term, just handling job > re-submissions after a restart is a good tradeoff. That is, the JobTracker > after getting restarted, will resubmit all jobs that were running in its past > life. They will all start from the beginning (downside is completed tasks > will reexecute). In the long term, the transaction log model or some variant > of that should be pursued. > Thoughts/comments welcome. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.