[ https://issues.apache.org/jira/browse/HADOOP-1970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12531411 ]
Arun C Murthy commented on HADOOP-1970: --------------------------------------- Ok, this is indeed a deadlock... the issue is that there is a differing order of locks of the parent and child Progress objects in Progress.java. As shown in the stack trace {{Progress.complete}} locks the child first and then the parent, where as {{Progress.toString(StringBuffer)}} locks the parent first and then the child... straight-forward fix is to ensure that parent is always locked first e.g. in {{Progress.complete}}. > tasktracker hang in reduce. Deadlock between main and comm thread > ----------------------------------------------------------------- > > Key: HADOOP-1970 > URL: https://issues.apache.org/jira/browse/HADOOP-1970 > Project: Hadoop > Issue Type: Bug > Components: mapred > Affects Versions: 0.14.1 > Reporter: Koji Noguchi > Assignee: Vivek Ratan > Priority: Blocker > Fix For: 0.14.2 > > > Saw one reduce task stuck on copy. > jstack on the reduce task(task_200709272248_0001_r_000150_0) process showed > {noformat} > Found one Java-level deadlock: > ============================= > "Comm thread for task_200709272248_0001_r_000150_0": > waiting to lock monitor 0x08144020 (object 0xd4e30aa8, a > org.apache.hadoop.util.Progress), > which is held by "main" > "main": > waiting to lock monitor 0x08144084 (object 0xd4e30958, a > org.apache.hadoop.util.Progress), > which is held by "Comm thread for task_200709272248_0001_r_000150_0" > Java stack information for the threads listed above: > =================================================== > "Comm thread for task_200709272248_0001_r_000150_0": > at org.apache.hadoop.util.Progress.toString(Progress.java:113) > - waiting to lock <0xd4e30aa8> (a org.apache.hadoop.util.Progress) > at org.apache.hadoop.util.Progress.toString(Progress.java:116) > - locked <0xd4e30958> (a org.apache.hadoop.util.Progress) > at org.apache.hadoop.util.Progress.toString(Progress.java:108) > at org.apache.hadoop.mapred.Task$1.run(Task.java:268) > at java.lang.Thread.run(Thread.java:619) > "main": > at org.apache.hadoop.util.Progress.startNextPhase(Progress.java:58) > - waiting to lock <0xd4e30958> (a org.apache.hadoop.util.Progress) > at org.apache.hadoop.util.Progress.complete(Progress.java:70) > - locked <0xd4e30aa8> (a org.apache.hadoop.util.Progress) > at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:253) > at > org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:1777) > {noformat} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.