[ 
https://issues.apache.org/jira/browse/TEZ-2548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14581447#comment-14581447
 ] 

Hitesh Shah commented on TEZ-2548:
----------------------------------

>From an offline mail: 

{code}
 - 
org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.putEntities(org.apache.hadoop.yarn.api.records.timeline.TimelineEntity[])
 @bci=19, line=305 (Interpreted frame)
 - 
org.apache.tez.dag.history.logging.ats.ATSHistoryLoggingService.handleEvents(java.util.List)
 @bci=188, line=343 (Interpreted frame)
 - 
org.apache.tez.dag.history.logging.ats.ATSHistoryLoggingService.serviceStop() 
@bci=273, line=229 (Interpreted frame)
 - org.apache.hadoop.service.AbstractService.stop() @bci=32, line=221 
(Interpreted frame)
 - 
org.apache.hadoop.service.ServiceOperations.stop(org.apache.hadoop.service.Service)
 @bci=5, line=52 (Interpreted frame)
 - 
org.apache.hadoop.service.ServiceOperations.stopQuietly(org.apache.commons.logging.Log,
 org.apache.hadoop.service.Service) @bci=1, line=80 (Interpreted frame)
 - org.apache.hadoop.service.CompositeService.stop(int, boolean) @bci=115, 
line=157 (Interpreted frame)
 - org.apache.hadoop.service.CompositeService.serviceStop() @bci=58, line=131 
(Interpreted frame)
 - org.apache.tez.dag.history.HistoryEventHandler.serviceStop() @bci=11, 
line=80 (Interpreted frame)
 - org.apache.hadoop.service.AbstractService.stop() @bci=32, line=221 
(Interpreted frame)
 - 
org.apache.hadoop.service.ServiceOperations.stop(org.apache.hadoop.service.Service)
 @bci=5, line=52 (Interpreted frame)
 - 
org.apache.hadoop.service.ServiceOperations.stopQuietly(org.apache.commons.logging.Log,
 org.apache.hadoop.service.Service) @bci=1, line=80 (Interpreted frame)
 - 
org.apache.hadoop.service.ServiceOperations.stopQuietly(org.apache.hadoop.service.Service)
 @bci=4, line=65 (Interpreted frame)
 - org.apache.tez.dag.app.DAGAppMaster.stopServices() @bci=137, line=1724 
(Interpreted frame)
 - org.apache.tez.dag.app.DAGAppMaster.serviceStop() @bci=30, line=1880 
(Interpreted frame)
 - org.apache.hadoop.service.AbstractService.stop() @bci=32, line=221 
(Interpreted frame)
 - 
org.apache.tez.dag.app.DAGAppMaster$DAGAppMasterShutdownHandler$AMShutdownRunnable.run()
 @bci=48, line=870 (Interpreted frame)
 - java.lang.Thread.run() @bci=11, line=745 (Interpreted frame)
{code}

But the sockets are not closed for the RPC for DAG AppMaster and the container 
does not exit, so the AM just locks up after an RPC dag submit comes through 
instead of erroring out with a SessionNotRunning.

{code}
Thread 16628: (state = BLOCKED)
 - 
org.apache.tez.dag.app.DAGAppMaster.submitDAGToAppMaster(org.apache.tez.dag.api.records.DAGProtos$DAGPlan,
 java.util.Map) @bci=0, line=1230 (Interpreted frame)
 - 
org.apache.tez.dag.api.client.DAGClientHandler.submitDAG(org.apache.tez.dag.api.records.DAGProtos$DAGPlan,
 java.util.Map) @bci=6, line=118 (Interpreted frame)
 - 
org.apache.tez.dag.api.client.rpc.DAGClientAMProtocolBlockingPBServerImpl.submitDAG(com.google.protobuf.RpcController,
 
org.apache.tez.dag.api.client.rpc.DAGClientAMProtocolRPC$SubmitDAGRequestProto) 
@bci=84, line=163 (Interpreted frame)
 - 
org.apache.tez.dag.api.client.rpc.DAGClientAMProtocolRPC$DAGClientAMProtocol$2.callBlockingMethod(com.google.protobuf.Descriptors$MethodDescriptor,
 com.google.protobuf.RpcController, com.google.protobuf.Message) @bci=137, 
line=7471 (Compiled frame)
 - 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(org.apache.hadoop.ipc.RPC$Server,
 java.lang.String, org.apache.hadoop.io.Writable, long) @bci=246, line=616 
(Compiled frame)
 - org.apache.hadoop.ipc.RPC$Server.call(org.apache.hadoop.ipc.RPC$RpcKind, 
java.lang.String, org.apache.hadoop.io.Writable, long) @bci=9, line=972 
(Compiled frame)
 - org.apache.hadoop.ipc.Server$Handler$1.run() @bci=38, line=2085 (Compiled 
frame)
 - org.apache.hadoop.ipc.Server$Handler$1.run() @bci=1, line=2081 (Compiled 
frame)
 - 
java.security.AccessController.doPrivileged(java.security.PrivilegedExceptionAction,
 java.security.AccessControlContext) @bci=0 (Compiled frame)
 - javax.security.auth.Subject.doAs(javax.security.auth.Subject, 
java.security.PrivilegedExceptionAction) @bci=42, line=422 (Compiled frame)
 - 
org.apache.hadoop.security.UserGroupInformation.doAs(java.security.PrivilegedExceptionAction)
 @bci=14, line=1654 (Compiled frame)
 - org.apache.hadoop.ipc.Server$Handler.run() @bci=308, line=2081 (Interpreted 
frame)
{code}

And the hive CLI just hangs after a DAG submission, locking up a whole thread 
of tests due to the ATS not responding.

> TezClient submitDAG can hang if the AM is in the process of shutting down
> -------------------------------------------------------------------------
>
>                 Key: TEZ-2548
>                 URL: https://issues.apache.org/jira/browse/TEZ-2548
>             Project: Apache Tez
>          Issue Type: Bug
>            Reporter: Hitesh Shah
>            Assignee: Hitesh Shah
>         Attachments: TEZ-2548.1.patch
>
>
> submitDAG and serviceStop are both synchronized causing submitDAG to be 
> locked out during the shutdown process. 
> Seen by [~gopalv]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to