[ 
https://issues.apache.org/jira/browse/HADOOP-5285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12674978#action_12674978
 ] 

Vinod K V commented on HADOOP-5285:
-----------------------------------

In an offline discussion with Devaraj, Amareshwari and Hemanth, the suggested 
fix is to make JobInProgress.obtainTaskCleanupTask() similar to other methods 
like JobInProgress.obtainJobSetupTask() and not lock on JobInProgress if the 
job is yet not inited. Other suggestion was to move all the DFS operations in 
JT that might result in locking of JT into a separate thread; at present the 
only one operation that needs to be moved is job cleanup from 
JobTracker.finalizeJob().

> JobTracker hangs for long periods of time
> -----------------------------------------
>
>                 Key: HADOOP-5285
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5285
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.20.0
>            Reporter: Vinod K V
>            Priority: Blocker
>             Fix For: 0.20.0
>
>
> On one of the larger clusters of 2000 nodes, JT hanged quite often, sometimes 
> for times in the order of 10-15 minutes and once for one and a half hours(!). 
> The stack trace shows that JobInProgress.obtainTaskCleanupTask() is waiting 
> for lock on JobInProgress object which JobInProgress.initTasks() is holding 
> for a long time waiting for DFS operations.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to