[
https://issues.apache.org/jira/browse/MAPREDUCE-693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12726351#action_12726351
]
Amar Kamat commented on MAPREDUCE-693:
--------------------------------------
bq. The old conf files remain in the history folder and fail to be moved to
"done" subdirectory
There is no need to move the conf file to the done folder. In this case the job
is run as a new job and hence a new conf file is created for this job. The
jobhistory file gets deleted as it is required for recovery (checkpoint
process). The conf file is doesnt play any role in the recovery process. Here
is what is happening
# jobtracker starts with id _id1_
# job job1 is submitted and creates history file hostname_id1_job1_user_jobname
and conf file as hostname_id1_job1_conf.xml
# jobtracker restart with id _id2_
# jobtracker tries to recover the job. There are 2 possibilities here :
## If the job-initialization thread inits the job before the recovery-manager
picks up the job for recovery then the new filename would be
hostname_id1_job1_user_jobname.recover and the conf file would be
hostname_id1_job1_conf.xml. In such a case there wont be any garbage left in
the history folder.
## If the recovery-manager picks up the job first before the init-thread then
it will assume that there is nothing to recover and will delete
hostname_id1_job1_user_jobname (leaving hostname_id1_job1_conf.xml). When the
job inits, it will take a new filename i.e hostname_id2_job1_user_jobname and
hostname_id2_job1_conf.xml. Only in this case the conf file (
hostname_id1_job1_conf.xml) is left behind in the history folder.
AFAIK this is a timing issue. I think a proper fix for all this corner cases is
MAPREDUCE-11. Thoughts?
> Conf files not moved to "done" subdirectory after JT restart
> ------------------------------------------------------------
>
> Key: MAPREDUCE-693
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-693
> Project: Hadoop Map/Reduce
> Issue Type: Bug
> Components: jobtracker
> Affects Versions: 0.20.1
> Reporter: Ramya R
> Priority: Minor
> Fix For: 0.20.1
>
>
> After MAPREDUCE-516, when a job is submitted and the JT is restarted (before
> job files have been written) and the job is killed after recovery, the conf
> files fail to be moved to the "done" subdirectory.
> The exact scenario to reproduce this issue is:
> * Submit a job
> * Restart JT before anything is written to the job files
> * Kill the job
> * The old conf files remain in the history folder and fail to be moved to
> "done" subdirectory
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.