[ https://issues.apache.org/jira/browse/HADOOP-5285?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Vinod K V updated HADOOP-5285: ------------------------------ Attachment: 5285.1.patch Attaching patch with minor changes: - Documented CleanupQueue's constructor saying that it is a singleton, does cleanup in a separate thread and that the constructor itself automatically starts the thread as a daemon. - Removed an unused variable 'events' in JobTracker.getTaskCompletionEvents() > JobTracker hangs for long periods of time > ----------------------------------------- > > Key: HADOOP-5285 > URL: https://issues.apache.org/jira/browse/HADOOP-5285 > Project: Hadoop Core > Issue Type: Bug > Components: mapred > Affects Versions: 0.20.0 > Reporter: Vinod K V > Assignee: Devaraj Das > Priority: Blocker > Fix For: 0.20.0 > > Attachments: 5285.1.patch, 5285.patch, trace.txt > > > On one of the larger clusters of 2000 nodes, JT hanged quite often, sometimes > for times in the order of 10-15 minutes and once for one and a half hours(!). > The stack trace shows that JobInProgress.obtainTaskCleanupTask() is waiting > for lock on JobInProgress object which JobInProgress.initTasks() is holding > for a long time waiting for DFS operations. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.