[ 
https://issues.apache.org/jira/browse/TEZ-2307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15123898#comment-15123898
 ] 

Siddharth Seth commented on TEZ-2307:
-------------------------------------

bq. I thought about that. but it would make user confused that the last dag is 
completed but he still can not submit another dag due to AM is still in RUNNING.
I though this is what this jira is fixing ? Run the new DAG after the previous 
one is complete, taking into account errors from the new dag and cleanup of the 
old dag.

bq. For now it seems dag clean up won't take too much, have you thought to put 
it in DAGImpl.finish ?
Cleanup sends messages to user plugins. Calling it within finished would mean a 
dag status look up from the plugins would get the state as RUNNING, instead of 
the actual final state. DAG_CLEANUP was added as a new state in the 
DAGAppMaster state machine to allow for any events which are pending in the 
queue after "DAGAppMasterEventDAGFinished" to get processed. If you think 
there's no other events there - the DAG_CLEANUP state can be collapsed into 
DAG_FINISHED - in which case DAGAppMasterState.IDLE will be reached after 
cleanup. Otherwise, I think it's better to move the transition to the IDLE 
state into DAG_CLEANUP handling. In either case - notify after the state is 
IDLE - so that the new submission can proceed after the old dag is cleaned up.

> Possible wrong error message when submitting new dag
> ----------------------------------------------------
>
>                 Key: TEZ-2307
>                 URL: https://issues.apache.org/jira/browse/TEZ-2307
>             Project: Apache Tez
>          Issue Type: Bug
>            Reporter: Jeff Zhang
>            Assignee: Jeff Zhang
>         Attachments: TEZ-2307-1.patch, TEZ-2307-2.patch, TEZ-2307-3.patch, 
> TEZ-2307-4.patch
>
>
> In the following 2 cases, AM would propagate wrong error message to client 
> ("App master already running a DAG")
> * The last dag is completed but AM is still in RUNNING state
> * AM is in shutting down. 
> {code}
> 2015-04-10 06:01:50,369 INFO  [IPC Server handler 0 on 46821] ipc.Server 
> (Server.java:run(2070)) - IPC Server handler 0 on 46821, call 
> org.apache.tez.dag.api.client.rpc.DAGClientAMProtocolBlockingPB.submitDAG 
> from 10.0.0.223:48581 Call#411 Retry#0
> org.apache.tez.dag.api.TezException: App master already running a DAG
>       at 
> org.apache.tez.dag.app.DAGAppMaster.submitDAGToAppMaster(DAGAppMaster.java:1131)
>       at 
> org.apache.tez.dag.api.client.DAGClientHandler.submitDAG(DAGClientHandler.java:118)
>       at 
> org.apache.tez.dag.api.client.rpc.DAGClientAMProtocolBlockingPBServerImpl.submitDAG(DAGClientAMProtocolBlockingPBServerImpl.java:163)
>       at 
> org.apache.tez.dag.api.client.rpc.DAGClientAMProtocolRPC$DAGClientAMProtocol$2.callBlockingMethod(DAGClientAMProtocolRPC.java:7471)
>       at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
>       at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969)
>       at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049)
>       at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2045)
>       at java.security.AccessController.doPrivileged(Native Method)
>       at javax.security.auth.Subject.doAs(Subject.java:415)
>       at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
>       at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2043)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to