Job always stay at 'Pending' status and cannot finish several days
------------------------------------------------------------------

                 Key: MAPREDUCE-3362
                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3362
             Project: Hadoop Map/Reduce
          Issue Type: Bug
          Components: jobhistoryserver, jobtracker
    Affects Versions: 0.20.2
            Reporter: Denny Ye
            Priority: Critical


Our jobs are always keeping at 'pending' status several days. We checked 
jobtracker log and found that one task(attemp) failed due to failure to store 
job history to HDFS. 

The issue begins from the business that another job remove the folder that this 
job is being written with history log. In this case, there has 
'ConcurrentModificationException' at JobHistory#log(ArrayList<PrintWriter> 
writers, RecordTypes recordType, Keys[] keys, String[] values, JobID id). One 
thread checked if there has any output error and removed output with history 
folder at HDFS has been removed, another thread got 
'ConcurrentModificationException' at current 'writers' is blank.

Unfortunately, no one want to catch this exception and it go thought to 
TaskTracker(it jump over the calculating part to add 'finishedMapTask'). Then, 
another task(attemp) runs from 'failedMap' successfully, but the total 
'finishedMapTask' number is not the all finishedMapTask. JobCleanupTask cannot 
startup and job always stay at 'pending' status.

The root cause:
First task(attemp) failed with exception and this task add to 'failedMap' with 
decrease the 'finishedMap' counter. Next task(attemp) runs successfully and 
increase one for 'finishedMap'. Due to failure the total 'finishedMap' is less 
that actual finishedMap counter, so the cleanup task cannot runs. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to