[ 
https://issues.apache.org/jira/browse/HADOOP-5794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12712013#action_12712013
 ] 

rahul k singh commented on HADOOP-5794:
---------------------------------------

Analysis of the problem:
When the job tracker is restarted , RecoveryManager tries to recover the job 
from job history.RecoveryMaanger instantiates the JobInProgress object and sets 
its startTime as System.currentTimeMillis.In JobInProgress constructor 
JobStatus startTime is set as JIP's startTime .RecoveryManager fetches 
startTime information from job history and updates the JIP's startTime(remember 
this change is not propagated to JobStatus startTime) , hence now Jobstatus has 
old value of startTime . These Job statuses are used in JobQueuesManager to 
categorize jobs based on the state they are in. The data structure in 
JobQueuesManager(waitingJobs) uses startTime as the comparator.As waitingJobs 
has old startTime value , it has the old entry.
Whenever we try to do "hadoop job -list" JobTracker's getJobStatus method is 
called , this sets the JobStatus startTime value with JobInProgress startTime 
value , now at this point , startTime values in JIP and JobStatus are 
consistent, but the startTime value in waitingJobs in JobQueueManager is stale 
. Hence when we try to remove the jobs which are 
completed(Completed/killed/failed , for example issueing "hadoop job -kill <>" 
command ) from waitingJobs() nothing is removed as comparator startTime is 
changed.

> Sometimes job does not get removed from scheduler queue after it is killed
> --------------------------------------------------------------------------
>
>                 Key: HADOOP-5794
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5794
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: contrib/capacity-sched
>    Affects Versions: 0.20.0
>            Reporter: Karam Singh
>
> Sometimes when we kill a job, it does get removed from waiting queue, while 
> job status: "Killed" with Job Setup and Cleanup: "Successful" 
> Also JobTracker webui shows job under failed jobs lists and hadoop job -list 
> all, hadoop queue <queuename> -showJobs also shows jobs state=5.
> Prior to killing job state was "Running"

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to