[ https://issues.apache.org/jira/browse/HADOOP-5285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12675920#action_12675920 ]
Hudson commented on HADOOP-5285: -------------------------------- Integrated in Hadoop-trunk #763 (See [http://hudson.zones.apache.org/hudson/job/Hadoop-trunk/763/]) . Adding a file that I missed in my earlier commit. . Fixes the issues - (1) obtainTaskCleanupTask checks whether job is inited before trying to lock the JobInProgress (2) Moves the CleanupQueue class outside the TaskTracker and makes it a generic class that is used by the JobTracker also for deleting the paths on the job's output fs. (3) Moves the references to completedJobStore outside the block where the JobTracker is locked. Contributed by Devaraj Das. > JobTracker hangs for long periods of time > ----------------------------------------- > > Key: HADOOP-5285 > URL: https://issues.apache.org/jira/browse/HADOOP-5285 > Project: Hadoop Core > Issue Type: Bug > Components: mapred > Affects Versions: 0.20.0 > Reporter: Vinod K V > Assignee: Devaraj Das > Priority: Blocker > Fix For: 0.20.0, 0.21.0 > > Attachments: 5285.1.patch, 5285.patch, trace.txt > > > On one of the larger clusters of 2000 nodes, JT hanged quite often, sometimes > for times in the order of 10-15 minutes and once for one and a half hours(!). > The stack trace shows that JobInProgress.obtainTaskCleanupTask() is waiting > for lock on JobInProgress object which JobInProgress.initTasks() is holding > for a long time waiting for DFS operations. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.