JobTracker's TaskCommitQueue is vulnerable to non-IOExceptions
--------------------------------------------------------------

                 Key: HADOOP-2051
                 URL: https://issues.apache.org/jira/browse/HADOOP-2051
             Project: Hadoop
          Issue Type: Bug
          Components: mapred
    Affects Versions: 0.15.0
            Reporter: Arun C Murthy
            Assignee: Arun C Murthy
            Priority: Blocker
             Fix For: 0.15.0


The {{JobTracker#TaskCommitQueue#run}} method only handles {{IOException}}s. 
Christian Kunz ran into a scenario where a job was stuck with all tasks in 
{{COMMIT_PENDING}} state and the stack traces showed that the "Task Commit 
Thread" wasn't even around.

The work-around is to model {{TaskCommitQueue#run}} along the lines of other 
long-running threads in the {{JobTracer}} ({{ExpireLaunchingTasks}}, 
{{ExpireTrackers}} etc.) to catch, log and ignore any {{Exception}} in a loop.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to