[ 
https://issues.apache.org/jira/browse/MAPREDUCE-693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12726351#action_12726351
 ] 

Amar Kamat commented on MAPREDUCE-693:
--------------------------------------

bq.  The old conf files remain in the history folder and fail to be moved to 
"done" subdirectory
There is no need to move the conf file to the done folder. In this case the job 
is run as a new job and hence a new conf file is created for this job. The 
jobhistory file gets deleted as it is required for recovery (checkpoint 
process). The conf file is doesnt play any role in the recovery process. Here 
is what is happening 
# jobtracker starts with id _id1_
# job job1 is submitted and creates history file hostname_id1_job1_user_jobname 
and conf file as hostname_id1_job1_conf.xml
# jobtracker restart with id _id2_
# jobtracker tries to recover the job. There are 2 possibilities here :
 ## If the job-initialization thread inits the job before the recovery-manager 
picks up the job for recovery then the new filename would be  
hostname_id1_job1_user_jobname.recover and the conf file would be  
hostname_id1_job1_conf.xml. In such a case there wont be any garbage left in 
the history folder.
 ## If the recovery-manager picks up the job first before the init-thread then 
it will assume that there is nothing to recover and will delete 
hostname_id1_job1_user_jobname (leaving  hostname_id1_job1_conf.xml). When the 
job inits, it will take a new filename i.e  hostname_id2_job1_user_jobname and  
hostname_id2_job1_conf.xml. Only in this case the conf file ( 
hostname_id1_job1_conf.xml) is left behind in the history folder. 

AFAIK this is a timing issue. I think a proper fix for all this corner cases is 
MAPREDUCE-11. Thoughts?

> Conf files not moved to "done" subdirectory after JT restart
> ------------------------------------------------------------
>
>                 Key: MAPREDUCE-693
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-693
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: jobtracker
>    Affects Versions: 0.20.1
>            Reporter: Ramya R
>            Priority: Minor
>             Fix For: 0.20.1
>
>
> After MAPREDUCE-516, when a job is submitted and the JT is restarted (before 
> job files have been written) and the job is killed after recovery, the conf 
> files fail to be moved to the "done" subdirectory.
> The exact scenario to reproduce this issue is:
> * Submit a job
> * Restart JT before anything is written to the job files
> * Kill the job
> * The old conf files remain in the history folder and fail to be moved to 
> "done" subdirectory

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to