[ https://issues.apache.org/jira/browse/HADOOP-5285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12674978#action_12674978 ]
Vinod K V commented on HADOOP-5285: ----------------------------------- In an offline discussion with Devaraj, Amareshwari and Hemanth, the suggested fix is to make JobInProgress.obtainTaskCleanupTask() similar to other methods like JobInProgress.obtainJobSetupTask() and not lock on JobInProgress if the job is yet not inited. Other suggestion was to move all the DFS operations in JT that might result in locking of JT into a separate thread; at present the only one operation that needs to be moved is job cleanup from JobTracker.finalizeJob(). > JobTracker hangs for long periods of time > ----------------------------------------- > > Key: HADOOP-5285 > URL: https://issues.apache.org/jira/browse/HADOOP-5285 > Project: Hadoop Core > Issue Type: Bug > Components: mapred > Affects Versions: 0.20.0 > Reporter: Vinod K V > Priority: Blocker > Fix For: 0.20.0 > > > On one of the larger clusters of 2000 nodes, JT hanged quite often, sometimes > for times in the order of 10-15 minutes and once for one and a half hours(!). > The stack trace shows that JobInProgress.obtainTaskCleanupTask() is waiting > for lock on JobInProgress object which JobInProgress.initTasks() is holding > for a long time waiting for DFS operations. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.