Hi Subroto, Could you file a jira for this with the output of jstack for the AM process and the AM logs?
thanks — Hitesh On Jun 10, 2014, at 3:26 PM, Subroto Sanyal <sanyalsubr...@gmail.com> wrote: > Hi, > > I have build the Tez jars from the git repository today; still, I see the > DAGAppMaster running even after the TezSession is stopped. > Do I need to get the code/jar from somewhere else to get the fix reflected? > > > On Tue, Jun 10, 2014 at 1:54 PM, Subroto Sanyal <sanyalsubr...@gmail.com> > wrote: > >> Hi Oleg, >> >> >> Thanks for confirming. Could you please provide the TEZ jira tickets for >> both of the issue where they have been solved. >> I couldn't find the code changes for closing TezClient. >> >> >> On Tue, Jun 10, 2014 at 1:25 PM, Oleg Zhurakousky < >> ozhurakou...@hortonworks.com> wrote: >> >>> Subroto >>> >>> Thanks for pointing this out. >>> This and the TezClient issue you’ve pointed out in your previous email is >>> actually being actively addressed >>> >>> Oleg >>> >>> On Jun 10, 2014, at 5:42 AM, Subroto Sanyal <sanyalsubr...@gmail.com> >>> wrote: >>> >>>> In the class AMRMClientAsyncImpl the object(7c3041e28) is being locked >>> by >>>> Heartbeat thread(which kinds of run a infinite loop as any heartbeat >>>> thread) which is requested to be locked by the method >>>> unregisterApplicationMaster. >>>> >>>> Once the method unregisterApplicationMaster can lock the requested >>> object; >>>> then only it can notify the heartbeat thread to exit by a boolean flag >>>> keepRunning. >>>> >>>> Following is the thread-dump for the deadlock: >>>> >>>> "AMShutdownThread" daemon prio=5 tid=7f9a02921800 nid=0x115d68000 >>> waiting >>>> for monitor entry [115d67000] >>>> >>>> java.lang.Thread.State: BLOCKED (on object monitor) >>>> >>>> at >>>> >>> org.apache.hadoop.yarn.client.api.async.impl.AMRMClientAsyncImpl.unregisterApplicationMaster(AMRMClientAsyncImpl.java:156) >>>> >>>> - waiting to lock <7c3041e28> (a java.lang.Object) >>>> >>>> at >>>> >>> org.apache.tez.dag.app.rm.TaskScheduler.serviceStop(TaskScheduler.java:394) >>>> >>>> - locked <7c3006aa0> (a org.apache.tez.dag.app.rm.TaskScheduler) >>>> >>>> at >>> org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221) >>>> >>>> - locked <7c3038008> (a java.lang.Object) >>>> >>>> at >>>> >>> org.apache.tez.dag.app.rm.TaskSchedulerEventHandler.serviceStop(TaskSchedulerEventHandler.java:357) >>>> >>>> at >>> org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221) >>>> >>>> - locked <7c2f71360> (a java.lang.Object) >>>> >>>> at >>>> >>> org.apache.hadoop.service.ServiceOperations.stop(ServiceOperations.java:52) >>>> >>>> at >>>> >>> org.apache.hadoop.service.ServiceOperations.stopQuietly(ServiceOperations.java:80) >>>> >>>> at >>> org.apache.tez.dag.app.DAGAppMaster.stopServices(DAGAppMaster.java:1518) >>>> >>>> at org.apache.tez.dag.app.DAGAppMaster.serviceStop(DAGAppMaster.java: >>> 1649) >>>> >>>> - locked <7c2f51790> (a org.apache.tez.dag.app.DAGAppMaster) >>>> >>>> at >>> org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221) >>>> >>>> - locked <7c2fed728> (a java.lang.Object) >>>> >>>> at >>>> >>> org.apache.tez.dag.app.DAGAppMaster$DAGAppMasterShutdownHandler$AMShutdownRunnable.run(DAGAppMaster.java:607) >>>> >>>> at java.lang.Thread.run(Thread.java:695) >>>> >>>> >>>> "AMRM Heartbeater thread" prio=5 tid=7f9a0c0e8800 nid=0x111e70000 >>> waiting >>>> on condition [111e6f000] >>>> >>>> java.lang.Thread.State: TIMED_WAITING (sleeping) >>>> >>>> at java.lang.Thread.sleep(Native Method) >>>> >>>> at >>>> >>> org.apache.hadoop.util.ThreadUtil.sleepAtLeastIgnoreInterrupts(ThreadUtil.java:43) >>>> >>>> at >>>> >>> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:150) >>>> >>>> at com.sun.proxy.$Proxy9.allocate(Unknown Source) >>>> >>>> at >>>> >>> org.apache.hadoop.yarn.client.api.impl.AMRMClientImpl.allocate(AMRMClientImpl.java:246) >>>> >>>> at >>>> >>> org.apache.hadoop.yarn.client.api.async.impl.AMRMClientAsyncImpl$HeartbeatThread.run(AMRMClientAsyncImpl.java:224) >>>> >>>> - locked <7c3041e28> (a java.lang.Object) >>>> >>>> *public void unregisterApplicationMaster(FinalApplicationStatus >>> appStatus,* >>>> >>>> * String appMessage, String appTrackingUrl) throws YarnException,* >>>> >>>> * IOException {* >>>> >>>> * synchronized (unregisterHeartbeatLock) {* >>>> >>>> * keepRunning = false;* >>>> >>>> * client.unregisterApplicationMaster(appStatus, appMessage, >>>> appTrackingUrl);* >>>> >>>> * }* >>>> >>>> * }* >>>> >>>> >>>> The line "keepRunning = false" should be outside the synchronized block. >>>> >>>> I am not sure this should be regarded as problem in yarn or TEZ. The >>> flag >>>> is private and can't be accessed by Tez implementation >>> TezAMRMClientAsync. >>>> >>>> >>>> -- >>>> Cheers, >>>> *Subroto Sanyal* >>> >>> >>> -- >>> CONFIDENTIALITY NOTICE >>> NOTICE: This message is intended for the use of the individual or entity >>> to >>> which it is addressed and may contain information that is confidential, >>> privileged and exempt from disclosure under applicable law. If the reader >>> of this message is not the intended recipient, you are hereby notified >>> that >>> any printing, copying, dissemination, distribution, disclosure or >>> forwarding of this communication is strictly prohibited. If you have >>> received this communication in error, please contact the sender >>> immediately >>> and delete it from your system. Thank You. >>> >> >> >> >> -- >> Cheers, >> *Subroto Sanyal* >> > > > > -- > Cheers, > *Subroto Sanyal*