[ 
https://issues.apache.org/jira/browse/TEZ-2307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15120974#comment-15120974
 ] 

Jeff Zhang commented on TEZ-2307:
---------------------------------

Attach a new patch.  [~sseth] Please help review.

* This patch has one drawback that it would make the dag submission RPC block 
there if the previous dag cleanup is not done. But I suppose it would not take 
too much time for the dag clean up, (we can add timeout if necessary) 
* In the method of DAGImpl.finish, it needs to set the dagCleanupDone flag, 
otherwise the next dag submission may not know whether the cleanup of previous 
dag is done. 


> Possible wrong error message when submitting new dag
> ----------------------------------------------------
>
>                 Key: TEZ-2307
>                 URL: https://issues.apache.org/jira/browse/TEZ-2307
>             Project: Apache Tez
>          Issue Type: Bug
>            Reporter: Jeff Zhang
>            Assignee: Jeff Zhang
>         Attachments: TEZ-2307-1.patch, TEZ-2307-2.patch
>
>
> In the following 2 cases, AM would propagate wrong error message to client 
> ("App master already running a DAG")
> * The last dag is completed but AM is still in RUNNING state
> * AM is in shutting down. 
> {code}
> 2015-04-10 06:01:50,369 INFO  [IPC Server handler 0 on 46821] ipc.Server 
> (Server.java:run(2070)) - IPC Server handler 0 on 46821, call 
> org.apache.tez.dag.api.client.rpc.DAGClientAMProtocolBlockingPB.submitDAG 
> from 10.0.0.223:48581 Call#411 Retry#0
> org.apache.tez.dag.api.TezException: App master already running a DAG
>       at 
> org.apache.tez.dag.app.DAGAppMaster.submitDAGToAppMaster(DAGAppMaster.java:1131)
>       at 
> org.apache.tez.dag.api.client.DAGClientHandler.submitDAG(DAGClientHandler.java:118)
>       at 
> org.apache.tez.dag.api.client.rpc.DAGClientAMProtocolBlockingPBServerImpl.submitDAG(DAGClientAMProtocolBlockingPBServerImpl.java:163)
>       at 
> org.apache.tez.dag.api.client.rpc.DAGClientAMProtocolRPC$DAGClientAMProtocol$2.callBlockingMethod(DAGClientAMProtocolRPC.java:7471)
>       at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
>       at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969)
>       at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049)
>       at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2045)
>       at java.security.AccessController.doPrivileged(Native Method)
>       at javax.security.auth.Subject.doAs(Subject.java:415)
>       at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
>       at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2043)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to