[ 
https://issues.apache.org/jira/browse/MAPREDUCE-805?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12741681#action_12741681
 ] 

Amar Kamat commented on MAPREDUCE-805:
--------------------------------------

Note that I purposefully added sleeps in JobTracker.initJob() and 
JobInProgress.initTasks to take care of race conditions. I didnt see any side 
effect. With this patch init will always keep the job in PREP state but based 
on whether 
- setup is required or not 
- tasks are needed to run 
- job-kill was issued during init 
- job-init failed

the job can move to RUNNING or SUCCCEEDED or KILLED or FAILED state or remain 
in PREP state. Here is how the state transition happens (note that after 
job.initTasks() the job will be in PREP state)
||setup needed?||maps=0 and reduces=0?||job killed during init?||init 
failed?||new state||comment||
|*|*|*|yes|FAILED|irrespective of what the config is, if the job fails in init, 
its marked as FAILED|
|*|*|yes|no|KILLED|irrespective of what the config is, if the job is killed 
during init and init passed normally then the job is marked as KILLED|
|yes|*|no|no|PREP|if job is configured to run setup then the job will remain in 
PREP state|
|no|yes|no|no|SUCCEEDED|if the job has no setup configured and if there are no 
maps and reduces then the job is marked SUCCEEDED|
|no|no|no|no|RUNNING|if the job has no setup configured and if there are maps 
and reduces then the job is marked RUNNING|


> Deadlock in Jobtracker
> ----------------------
>
>                 Key: MAPREDUCE-805
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-805
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>            Reporter: Michael Tamm
>         Attachments: MAPREDUCE-805-v1.1.patch, 
> MAPREDUCE-805-v1.11-branch-0.20.patch, MAPREDUCE-805-v1.11.patch, 
> MAPREDUCE-805-v1.2.patch, MAPREDUCE-805-v1.3.patch, MAPREDUCE-805-v1.6.patch, 
> MAPREDUCE-805-v1.7.patch
>
>
> We are running a hadoop cluster (version 0.20.0) and have detected the 
> following deadlock on our jobtracker:
> {code}
> "IPC Server handler 51 on 9001":
>       at 
> org.apache.hadoop.mapred.JobInProgress.getCounters(JobInProgress.java:943)
>       - waiting to lock <0x00007f2b6fb46130> (a 
> org.apache.hadoop.mapred.JobInProgress)
>       at 
> org.apache.hadoop.mapred.JobTracker.getJobCounters(JobTracker.java:3102)
>       - locked <0x00007f2b5f026000> (a org.apache.hadoop.mapred.JobTracker)
>       at sun.reflect.GeneratedMethodAccessor21.invoke(Unknown Source)
>       at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>       at java.lang.reflect.Method.invoke(Method.java:597)
>       at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:508)
>       at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:959)
>       at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:955)
>       at java.security.AccessController.doPrivileged(Native Method)
>       at javax.security.auth.Subject.doAs(Subject.java:396)
>       at org.apache.hadoop.ipc.Server$Handler.run(Server.java:953)
>  "pool-1-thread-2":
>       at org.apache.hadoop.mapred.JobTracker.finalizeJob(JobTracker.java:2017)
>       - waiting to lock <0x00007f2b5f026000> (a 
> org.apache.hadoop.mapred.JobTracker)
>       at 
> org.apache.hadoop.mapred.JobInProgress.garbageCollect(JobInProgress.java:2483)
>       - locked <0x00007f2b6fb46130> (a org.apache.hadoop.mapred.JobInProgress)
>       at 
> org.apache.hadoop.mapred.JobInProgress.terminateJob(JobInProgress.java:2152)
>       - locked <0x00007f2b6fb46130> (a org.apache.hadoop.mapred.JobInProgress)
>       at 
> org.apache.hadoop.mapred.JobInProgress.terminate(JobInProgress.java:2169)
>       - locked <0x00007f2b6fb46130> (a org.apache.hadoop.mapred.JobInProgress)
>       at org.apache.hadoop.mapred.JobInProgress.fail(JobInProgress.java:2245)
>       - locked <0x00007f2b6fb46130> (a org.apache.hadoop.mapred.JobInProgress)
>       at 
> org.apache.hadoop.mapred.EagerTaskInitializationListener$InitJob.run(EagerTaskInitializationListener.java:86)
>       at 
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>       at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>       at java.lang.Thread.run(Thread.java:619)
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to