[ https://issues.apache.org/jira/browse/HADOOP-2051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12534887 ]
Devaraj Das commented on HADOOP-2051: ------------------------------------- Arun, could you please modify the patch to do "return" when InterruptedException is thrown by queue.take(). > JobTracker's TaskCommitQueue is vulnerable to non-IOExceptions > -------------------------------------------------------------- > > Key: HADOOP-2051 > URL: https://issues.apache.org/jira/browse/HADOOP-2051 > Project: Hadoop > Issue Type: Bug > Components: mapred > Affects Versions: 0.15.0 > Reporter: Arun C Murthy > Assignee: Arun C Murthy > Priority: Blocker > Fix For: 0.15.0 > > Attachments: HADOOP-2051_1_20071013.patch > > > The {{JobTracker#TaskCommitQueue#run}} method only handles {{IOException}}s. > Christian Kunz ran into a scenario where a job was stuck with all tasks in > {{COMMIT_PENDING}} state and the stack traces showed that the "Task Commit > Thread" wasn't even around. > The work-around is to model {{TaskCommitQueue#run}} along the lines of other > long-running threads in the {{JobTracer}} ({{ExpireLaunchingTasks}}, > {{ExpireTrackers}} etc.) to catch, log and ignore any {{Exception}} in a loop. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.