[
https://issues.apache.org/jira/browse/HADOOP-5929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12713850#action_12713850
]
Devaraj Das commented on HADOOP-5929:
-------------------------------------
Also, we should remove the <date> field from the job history filename format.
Anyway the date is already there in the jobID and that is a part of the
filename.
> Cleanup JobHistory file naming to do with job recovery
> ------------------------------------------------------
>
> Key: HADOOP-5929
> URL: https://issues.apache.org/jira/browse/HADOOP-5929
> Project: Hadoop Core
> Issue Type: Bug
> Components: mapred
> Affects Versions: 0.19.0
> Reporter: Devaraj Das
> Fix For: 0.21.0
>
>
> The JobTracker uses the job history files for doing job recovery upon
> startup. To handle cases where JobTracker goes down again while the recovered
> job is running, there is some logic that plays with files and it ends up
> having two history files for some window of time during the life of the job -
> actual history file, .recover file. The idea being that upon the next restart
> we should be able to the maximal number of events for the job. It led to
> performance problems in the job submission / recovery (part of which got
> addressed in HADOOP-4372). It also looks pretty unlikely that a running job
> will traverse across multiple JT restarts. Even if it did, without the
> .recover file, it'd only mean that we lose some tasks that got completed in a
> subsequent restart. I propose that we remove the .recover file logic and base
> the recovery on only the original job history file.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.