Subroto Thanks for pointing this out. This and the TezClient issue you’ve pointed out in your previous email is actually being actively addressed
Oleg On Jun 10, 2014, at 5:42 AM, Subroto Sanyal <sanyalsubr...@gmail.com> wrote: > In the class AMRMClientAsyncImpl the object(7c3041e28) is being locked by > Heartbeat thread(which kinds of run a infinite loop as any heartbeat > thread) which is requested to be locked by the method > unregisterApplicationMaster. > > Once the method unregisterApplicationMaster can lock the requested object; > then only it can notify the heartbeat thread to exit by a boolean flag > keepRunning. > > Following is the thread-dump for the deadlock: > > "AMShutdownThread" daemon prio=5 tid=7f9a02921800 nid=0x115d68000 waiting > for monitor entry [115d67000] > > java.lang.Thread.State: BLOCKED (on object monitor) > > at > org.apache.hadoop.yarn.client.api.async.impl.AMRMClientAsyncImpl.unregisterApplicationMaster(AMRMClientAsyncImpl.java:156) > > - waiting to lock <7c3041e28> (a java.lang.Object) > > at > org.apache.tez.dag.app.rm.TaskScheduler.serviceStop(TaskScheduler.java:394) > > - locked <7c3006aa0> (a org.apache.tez.dag.app.rm.TaskScheduler) > > at org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221) > > - locked <7c3038008> (a java.lang.Object) > > at > org.apache.tez.dag.app.rm.TaskSchedulerEventHandler.serviceStop(TaskSchedulerEventHandler.java:357) > > at org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221) > > - locked <7c2f71360> (a java.lang.Object) > > at > org.apache.hadoop.service.ServiceOperations.stop(ServiceOperations.java:52) > > at > org.apache.hadoop.service.ServiceOperations.stopQuietly(ServiceOperations.java:80) > > at org.apache.tez.dag.app.DAGAppMaster.stopServices(DAGAppMaster.java:1518) > > at org.apache.tez.dag.app.DAGAppMaster.serviceStop(DAGAppMaster.java:1649) > > - locked <7c2f51790> (a org.apache.tez.dag.app.DAGAppMaster) > > at org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221) > > - locked <7c2fed728> (a java.lang.Object) > > at > org.apache.tez.dag.app.DAGAppMaster$DAGAppMasterShutdownHandler$AMShutdownRunnable.run(DAGAppMaster.java:607) > > at java.lang.Thread.run(Thread.java:695) > > > "AMRM Heartbeater thread" prio=5 tid=7f9a0c0e8800 nid=0x111e70000 waiting > on condition [111e6f000] > > java.lang.Thread.State: TIMED_WAITING (sleeping) > > at java.lang.Thread.sleep(Native Method) > > at > org.apache.hadoop.util.ThreadUtil.sleepAtLeastIgnoreInterrupts(ThreadUtil.java:43) > > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:150) > > at com.sun.proxy.$Proxy9.allocate(Unknown Source) > > at > org.apache.hadoop.yarn.client.api.impl.AMRMClientImpl.allocate(AMRMClientImpl.java:246) > > at > org.apache.hadoop.yarn.client.api.async.impl.AMRMClientAsyncImpl$HeartbeatThread.run(AMRMClientAsyncImpl.java:224) > > - locked <7c3041e28> (a java.lang.Object) > > *public void unregisterApplicationMaster(FinalApplicationStatus appStatus,* > > * String appMessage, String appTrackingUrl) throws YarnException,* > > * IOException {* > > * synchronized (unregisterHeartbeatLock) {* > > * keepRunning = false;* > > * client.unregisterApplicationMaster(appStatus, appMessage, > appTrackingUrl);* > > * }* > > * }* > > > The line "keepRunning = false" should be outside the synchronized block. > > I am not sure this should be regarded as problem in yarn or TEZ. The flag > is private and can't be accessed by Tez implementation TezAMRMClientAsync. > > > -- > Cheers, > *Subroto Sanyal* -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.