[ https://issues.apache.org/jira/browse/TEZ-1495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14136890#comment-14136890 ]
Prakash Ramachandran commented on TEZ-1495: ------------------------------------------- [~bikassaha] thanks for the review. DAGClientRPCImpl.waitForCompletion used to call DAGClientRPCImpl.getDAGStatus. In case AM was not reachable, to distinguish between AM restart and AM completed (for ex in case of user calling getDAGStatus after completion), it used to get the status from RM. I have moved this to DagClientImpl. So if the waitForCompletion is in DAGClientRPCImpl, the checks for dagCompletion (got DAGNotRunningException), checking yarnstate, etc will be inside the DAGClientRPCImpl.getDAGStatus or in waitforcompletion, which i thought we wanted to avoid. Also when fetched from RM, only the state and diagnostics are returned, the progress can show some transitions like RUNNING 10% -> SUBMITTED 0% -> RUNNING 0% -> RUNNING 10% (when attempt restart is happening) Without any indication that a attempt restart happened, i thought that would be confusing to the user. So the dagclientimpl.getDAGStatus now returns cached value from AM if its not completed. It still does not fully solve the issue as AM can return a status of RUNNING 0% while recovery is happening, but that is a different issue. Let me know your thoughts. > ATS integration for TezClient > ----------------------------- > > Key: TEZ-1495 > URL: https://issues.apache.org/jira/browse/TEZ-1495 > Project: Apache Tez > Issue Type: Bug > Reporter: Prakash Ramachandran > Assignee: Prakash Ramachandran > Attachments: TEZ-1495.1.patch, TEZ-1495.2.patch, TEZ-1495.3.patch, > TEZ-1495.4.patch, TEZ-1495.5.patch, TEZ-1495.6.patch, TEZ-1495.WIP.1.patch, > tez-1495-branch0-5.1.patch > > > Tez client should automatically redirect to ATS when the AM is not running. > All APIs exposed ( DAG status, counters, etc ) from the DAGClient should > continue to work after the AM has shut down. -- This message was sent by Atlassian JIRA (v6.3.4#6332)