[ https://issues.apache.org/jira/browse/HADOOP-5929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12713850#action_12713850 ]
Devaraj Das commented on HADOOP-5929: ------------------------------------- Also, we should remove the <date> field from the job history filename format. Anyway the date is already there in the jobID and that is a part of the filename. > Cleanup JobHistory file naming to do with job recovery > ------------------------------------------------------ > > Key: HADOOP-5929 > URL: https://issues.apache.org/jira/browse/HADOOP-5929 > Project: Hadoop Core > Issue Type: Bug > Components: mapred > Affects Versions: 0.19.0 > Reporter: Devaraj Das > Fix For: 0.21.0 > > > The JobTracker uses the job history files for doing job recovery upon > startup. To handle cases where JobTracker goes down again while the recovered > job is running, there is some logic that plays with files and it ends up > having two history files for some window of time during the life of the job - > actual history file, .recover file. The idea being that upon the next restart > we should be able to the maximal number of events for the job. It led to > performance problems in the job submission / recovery (part of which got > addressed in HADOOP-4372). It also looks pretty unlikely that a running job > will traverse across multiple JT restarts. Even if it did, without the > .recover file, it'd only mean that we lose some tasks that got completed in a > subsequent restart. I propose that we remove the .recover file logic and base > the recovery on only the original job history file. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.