[
https://issues.apache.org/jira/browse/MAPREDUCE-3362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13163548#comment-13163548
]
Subroto Sanyal commented on MAPREDUCE-3362:
-------------------------------------------
Hi Denny,
By any chance are you falling into this scenario:
https://issues.apache.org/jira/browse/MAPREDUCE-2129?focusedCommentId=13081564&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13081564
> Job always stay at 'Pending' status and cannot finish several days
> ------------------------------------------------------------------
>
> Key: MAPREDUCE-3362
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-3362
> Project: Hadoop Map/Reduce
> Issue Type: Bug
> Components: jobhistoryserver, jobtracker
> Affects Versions: 0.20.2
> Reporter: Denny Ye
> Priority: Critical
> Labels: jobtracker
>
> Our jobs are always keeping at 'pending' status several days. We checked
> jobtracker log and found that one task(attemp) failed due to failure to store
> job history to HDFS.
> The issue begins from the business that another job remove the folder that
> this job is being written with history log. In this case, there has
> 'ConcurrentModificationException' at JobHistory#log(ArrayList<PrintWriter>
> writers, RecordTypes recordType, Keys[] keys, String[] values, JobID id). One
> thread checked if there has any output error and removed output with history
> folder at HDFS has been removed, another thread got
> 'ConcurrentModificationException' at current 'writers' is blank.
> Unfortunately, no one want to catch this exception and it go thought to
> TaskTracker(it jump over the calculating part to add 'finishedMapTask').
> Then, another task(attemp) runs from 'failedMap' successfully, but the total
> 'finishedMapTask' number is not the all finishedMapTask. JobCleanupTask
> cannot startup and job always stay at 'pending' status.
> The root cause:
> First task(attemp) failed with exception and this task add to 'failedMap'
> with decrease the 'finishedMap' counter. Next task(attemp) runs successfully
> and increase one for 'finishedMap'. Due to failure the total 'finishedMap' is
> less that actual finishedMap counter, so the cleanup task cannot runs.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira