[
https://issues.apache.org/jira/browse/MAPREDUCE-805?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12735907#action_12735907
]
Vinod K V commented on MAPREDUCE-805:
-------------------------------------
Had a cursory look at the patch. It will be good to add javadoc for
JobInProgress.initTasks() and JobInProgress.fail() mentioning that these
methods ARE NOT supposed to be called directly by the schedulers and suggesting
that the JobTracker methods be preferred to over JobInProgress methods for
general use.
Given this issue, it will also be helpful to document the locking order
(JobTracker, JobInProgress) so that, for e.g, schedulers don't lock
JobInProgress asynchronously before calling these methods.
Though not directly related to the patch, it will be good to document that
JobTracker is locked while calling JobInProgressListener update methods.
> Deadlock in Jobtracker
> ----------------------
>
> Key: MAPREDUCE-805
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-805
> Project: Hadoop Map/Reduce
> Issue Type: Bug
> Reporter: Michael Tamm
> Attachments: MAPREDUCE-805-v1.1.patch
>
>
> We are running a hadoop cluster (version 0.20.0) and have detected the
> following deadlock on our jobtracker:
> {code}
> "IPC Server handler 51 on 9001":
> at
> org.apache.hadoop.mapred.JobInProgress.getCounters(JobInProgress.java:943)
> - waiting to lock <0x00007f2b6fb46130> (a
> org.apache.hadoop.mapred.JobInProgress)
> at
> org.apache.hadoop.mapred.JobTracker.getJobCounters(JobTracker.java:3102)
> - locked <0x00007f2b5f026000> (a org.apache.hadoop.mapred.JobTracker)
> at sun.reflect.GeneratedMethodAccessor21.invoke(Unknown Source)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:508)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:959)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:955)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:396)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:953)
> "pool-1-thread-2":
> at org.apache.hadoop.mapred.JobTracker.finalizeJob(JobTracker.java:2017)
> - waiting to lock <0x00007f2b5f026000> (a
> org.apache.hadoop.mapred.JobTracker)
> at
> org.apache.hadoop.mapred.JobInProgress.garbageCollect(JobInProgress.java:2483)
> - locked <0x00007f2b6fb46130> (a org.apache.hadoop.mapred.JobInProgress)
> at
> org.apache.hadoop.mapred.JobInProgress.terminateJob(JobInProgress.java:2152)
> - locked <0x00007f2b6fb46130> (a org.apache.hadoop.mapred.JobInProgress)
> at
> org.apache.hadoop.mapred.JobInProgress.terminate(JobInProgress.java:2169)
> - locked <0x00007f2b6fb46130> (a org.apache.hadoop.mapred.JobInProgress)
> at org.apache.hadoop.mapred.JobInProgress.fail(JobInProgress.java:2245)
> - locked <0x00007f2b6fb46130> (a org.apache.hadoop.mapred.JobInProgress)
> at
> org.apache.hadoop.mapred.EagerTaskInitializationListener$InitJob.run(EagerTaskInitializationListener.java:86)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> at java.lang.Thread.run(Thread.java:619)
> {code}
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.