In the class AMRMClientAsyncImpl the object(7c3041e28) is being locked by Heartbeat thread(which kinds of run a infinite loop as any heartbeat thread) which is requested to be locked by the method unregisterApplicationMaster.
Once the method unregisterApplicationMaster can lock the requested object; then only it can notify the heartbeat thread to exit by a boolean flag keepRunning. Following is the thread-dump for the deadlock: "AMShutdownThread" daemon prio=5 tid=7f9a02921800 nid=0x115d68000 waiting for monitor entry [115d67000] java.lang.Thread.State: BLOCKED (on object monitor) at org.apache.hadoop.yarn.client.api.async.impl.AMRMClientAsyncImpl.unregisterApplicationMaster(AMRMClientAsyncImpl.java:156) - waiting to lock <7c3041e28> (a java.lang.Object) at org.apache.tez.dag.app.rm.TaskScheduler.serviceStop(TaskScheduler.java:394) - locked <7c3006aa0> (a org.apache.tez.dag.app.rm.TaskScheduler) at org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221) - locked <7c3038008> (a java.lang.Object) at org.apache.tez.dag.app.rm.TaskSchedulerEventHandler.serviceStop(TaskSchedulerEventHandler.java:357) at org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221) - locked <7c2f71360> (a java.lang.Object) at org.apache.hadoop.service.ServiceOperations.stop(ServiceOperations.java:52) at org.apache.hadoop.service.ServiceOperations.stopQuietly(ServiceOperations.java:80) at org.apache.tez.dag.app.DAGAppMaster.stopServices(DAGAppMaster.java:1518) at org.apache.tez.dag.app.DAGAppMaster.serviceStop(DAGAppMaster.java:1649) - locked <7c2f51790> (a org.apache.tez.dag.app.DAGAppMaster) at org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221) - locked <7c2fed728> (a java.lang.Object) at org.apache.tez.dag.app.DAGAppMaster$DAGAppMasterShutdownHandler$AMShutdownRunnable.run(DAGAppMaster.java:607) at java.lang.Thread.run(Thread.java:695) "AMRM Heartbeater thread" prio=5 tid=7f9a0c0e8800 nid=0x111e70000 waiting on condition [111e6f000] java.lang.Thread.State: TIMED_WAITING (sleeping) at java.lang.Thread.sleep(Native Method) at org.apache.hadoop.util.ThreadUtil.sleepAtLeastIgnoreInterrupts(ThreadUtil.java:43) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:150) at com.sun.proxy.$Proxy9.allocate(Unknown Source) at org.apache.hadoop.yarn.client.api.impl.AMRMClientImpl.allocate(AMRMClientImpl.java:246) at org.apache.hadoop.yarn.client.api.async.impl.AMRMClientAsyncImpl$HeartbeatThread.run(AMRMClientAsyncImpl.java:224) - locked <7c3041e28> (a java.lang.Object) *public void unregisterApplicationMaster(FinalApplicationStatus appStatus,* * String appMessage, String appTrackingUrl) throws YarnException,* * IOException {* * synchronized (unregisterHeartbeatLock) {* * keepRunning = false;* * client.unregisterApplicationMaster(appStatus, appMessage, appTrackingUrl);* * }* * }* The line "keepRunning = false" should be outside the synchronized block. I am not sure this should be regarded as problem in yarn or TEZ. The flag is private and can't be accessed by Tez implementation TezAMRMClientAsync. -- Cheers, *Subroto Sanyal*