Hi,

I think I met with a possible deadlock situation. Not sure whether it is 
actually a deadlock or not :-)
Here is my scenario:

Run a Job and call JobClient.monitorAndPrintJob to monitor the job and get the 
status update.
In parallel try to invoke the JobClient$NetworkedJob.killJob. 

For reference I am attaching the Thread dump for both the operation:
"MrPlanRunner" daemon prio=5 tid=7fe12cacf000 nid=0x11352f000 in Object.wait() 
[11352d000]
   java.lang.Thread.State: WAITING (on object monitor)
        at java.lang.Object.wait(Native Method)
        - waiting on <7f3c55668> (a org.apache.hadoop.ipc.Client$Call)
        at java.lang.Object.wait(Object.java:485)
        at org.apache.hadoop.ipc.Client.call(Client.java:1145)
        - locked <7f3c55668> (a org.apache.hadoop.ipc.Client$Call)
        at org.apache.hadoop.ipc.Client.call(Client.java:1122)
        at 
org.apache.hadoop.yarn.ipc.ProtoOverHadoopRpcEngine$Invoker.invoke(ProtoOverHadoopRpcEngine.java:148)
        at $Proxy40.getApplicationReport(Unknown Source)
        at 
org.apache.hadoop.yarn.api.impl.pb.client.ClientRMProtocolPBClientImpl.getApplicationReport(ClientRMProtocolPBClientImpl.java:116)
        at 
org.apache.hadoop.mapred.ResourceMgrDelegate.getApplicationReport(ResourceMgrDelegate.java:343)
        at 
org.apache.hadoop.mapred.ClientServiceDelegate.getProxy(ClientServiceDelegate.java:143)
        at 
org.apache.hadoop.mapred.ClientServiceDelegate.invoke(ClientServiceDelegate.java:296)
        - locked <7f4d78950> (a org.apache.hadoop.mapred.ClientServiceDelegate)
        at 
org.apache.hadoop.mapred.ClientServiceDelegate.getJobStatus(ClientServiceDelegate.java:373)
        at org.apache.hadoop.mapred.YARNRunner.getJobStatus(YARNRunner.java:483)
        at org.apache.hadoop.mapreduce.Job$1.run(Job.java:322)
        at org.apache.hadoop.mapreduce.Job$1.run(Job.java:319)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:396)
        at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1177)
        at org.apache.hadoop.mapreduce.Job.updateStatus(Job.java:319)
        - locked <7f4f70fc0> (a org.apache.hadoop.mapreduce.Job)
        at org.apache.hadoop.mapreduce.Job.isComplete(Job.java:598)
        at org.apache.hadoop.mapreduce.Job.monitorAndPrintJob(Job.java:1280)
        at 
org.apache.hadoop.mapred.JobClient$NetworkedJob.monitorAndPrintJob(JobClient.java:432)
        at 
org.apache.hadoop.mapred.JobClient.monitorAndPrintJob(JobClient.java:902)
        at xxxxx.runJob(xxxxx.java:74)
        at xxxxx.doExecute(xxxxx.java:39)
        at xxxxx.doExecute(xxxxx.java:1)
        at xxxxexecute(xxxxxx.java:29)
        at xxxx.MrPlanRunner.run(xxxxx.java:117)
        at java.lang.Thread.run(Thread.java:680)

"Thread-2" prio=5 tid=7fe12e2de800 nid=0x114d15000 waiting for monitor entry 
[114d13000]
   java.lang.Thread.State: BLOCKED (on object monitor)
        at 
org.apache.hadoop.mapred.ClientServiceDelegate.invoke(ClientServiceDelegate.java:286)
        - waiting to lock <7f4d78950> (a 
org.apache.hadoop.mapred.ClientServiceDelegate)
        at 
org.apache.hadoop.mapred.ClientServiceDelegate.getJobStatus(ClientServiceDelegate.java:373)
        at org.apache.hadoop.mapred.YARNRunner.killJob(YARNRunner.java:509)
        at org.apache.hadoop.mapreduce.Job.killJob(Job.java:622)
        at 
org.apache.hadoop.mapred.JobClient$NetworkedJob.killJob(JobClient.java:319)
        - locked <7f4f8fa68> (a org.apache.hadoop.mapred.JobClient$NetworkedJob)
        at xxxx.cancelCurrentJob(xxxxxx.java:150)
        at xxxx.cancel(xxxxx.java:171)
        at xxxx.testCancelJob(xxxx.java:135)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:45)
        at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15)
        at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:42)
        at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20)
        at org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:46)
        at org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:46)
        at 
org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:62)


In the thread dump we can observe the object "7f4d78950" is being locked by 
MrPlanRunner(Thread calling JobClient.monitorAndPrintJob) thread and 
Thread-2(Thread calling JobClient$NetworkedJob.killJob) is trying to make an 
attempt to lock the same object and gets Blocked.

Please let me know if this a possible problem in the code or the usage of API 
is incorrect.
The build being used is:0.23.1-cdh4.0.0b2

Cheers,
Subroto Sanyal

Reply via email to