[ 
https://issues.apache.org/jira/browse/MAPREDUCE-805?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amar Kamat updated MAPREDUCE-805:
---------------------------------

    Attachment: MAPREDUCE-805-v1.12-branch-0.20.patch
                MAPREDUCE-805-v1.12.patch

Attaching a patch with extra log info during job-kill. I tested the patch for 
20 and it works as expected. Killed the job during init and the job was killed. 
Job init failure is handled as expected. Tested with capacity scheduler to see 
if JobTracker.failJob() raises events as expected.

> Deadlock in Jobtracker
> ----------------------
>
>                 Key: MAPREDUCE-805
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-805
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>            Reporter: Michael Tamm
>         Attachments: MAPREDUCE-805-v1.1.patch, 
> MAPREDUCE-805-v1.11-branch-0.20.patch, MAPREDUCE-805-v1.11.patch, 
> MAPREDUCE-805-v1.12-branch-0.20.patch, MAPREDUCE-805-v1.12.patch, 
> MAPREDUCE-805-v1.2.patch, MAPREDUCE-805-v1.3.patch, MAPREDUCE-805-v1.6.patch, 
> MAPREDUCE-805-v1.7.patch
>
>
> We are running a hadoop cluster (version 0.20.0) and have detected the 
> following deadlock on our jobtracker:
> {code}
> "IPC Server handler 51 on 9001":
>       at 
> org.apache.hadoop.mapred.JobInProgress.getCounters(JobInProgress.java:943)
>       - waiting to lock <0x00007f2b6fb46130> (a 
> org.apache.hadoop.mapred.JobInProgress)
>       at 
> org.apache.hadoop.mapred.JobTracker.getJobCounters(JobTracker.java:3102)
>       - locked <0x00007f2b5f026000> (a org.apache.hadoop.mapred.JobTracker)
>       at sun.reflect.GeneratedMethodAccessor21.invoke(Unknown Source)
>       at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>       at java.lang.reflect.Method.invoke(Method.java:597)
>       at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:508)
>       at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:959)
>       at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:955)
>       at java.security.AccessController.doPrivileged(Native Method)
>       at javax.security.auth.Subject.doAs(Subject.java:396)
>       at org.apache.hadoop.ipc.Server$Handler.run(Server.java:953)
>  "pool-1-thread-2":
>       at org.apache.hadoop.mapred.JobTracker.finalizeJob(JobTracker.java:2017)
>       - waiting to lock <0x00007f2b5f026000> (a 
> org.apache.hadoop.mapred.JobTracker)
>       at 
> org.apache.hadoop.mapred.JobInProgress.garbageCollect(JobInProgress.java:2483)
>       - locked <0x00007f2b6fb46130> (a org.apache.hadoop.mapred.JobInProgress)
>       at 
> org.apache.hadoop.mapred.JobInProgress.terminateJob(JobInProgress.java:2152)
>       - locked <0x00007f2b6fb46130> (a org.apache.hadoop.mapred.JobInProgress)
>       at 
> org.apache.hadoop.mapred.JobInProgress.terminate(JobInProgress.java:2169)
>       - locked <0x00007f2b6fb46130> (a org.apache.hadoop.mapred.JobInProgress)
>       at org.apache.hadoop.mapred.JobInProgress.fail(JobInProgress.java:2245)
>       - locked <0x00007f2b6fb46130> (a org.apache.hadoop.mapred.JobInProgress)
>       at 
> org.apache.hadoop.mapred.EagerTaskInitializationListener$InitJob.run(EagerTaskInitializationListener.java:86)
>       at 
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>       at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>       at java.lang.Thread.run(Thread.java:619)
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to