[ 
https://issues.apache.org/jira/browse/TEZ-1495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14136890#comment-14136890
 ] 

Prakash Ramachandran commented on TEZ-1495:
-------------------------------------------

[~bikassaha] thanks for the review.

DAGClientRPCImpl.waitForCompletion used to call DAGClientRPCImpl.getDAGStatus. 
In case AM was not reachable, to distinguish between AM restart and AM 
completed (for ex in case of user calling getDAGStatus after completion), it 
used to get the status from RM. I have moved this to DagClientImpl. 

So if the waitForCompletion is in DAGClientRPCImpl,  the checks for 
dagCompletion (got DAGNotRunningException), checking yarnstate, etc will be 
inside the DAGClientRPCImpl.getDAGStatus or in waitforcompletion, which i 
thought we wanted to avoid. 

Also when fetched from RM, only the state and diagnostics are returned, the 
progress can show some transitions like
RUNNING 10% -> SUBMITTED 0% -> RUNNING 0% -> RUNNING 10%  (when attempt restart 
is happening)
Without any indication that a attempt restart happened, i thought that would be 
confusing to the user. So the dagclientimpl.getDAGStatus now returns cached 
value from AM if its not completed. It still does not fully solve the issue as 
AM can return a status of RUNNING 0% while recovery is happening, but that is a 
different issue.

Let me know your thoughts.

 


> ATS integration for TezClient
> -----------------------------
>
>                 Key: TEZ-1495
>                 URL: https://issues.apache.org/jira/browse/TEZ-1495
>             Project: Apache Tez
>          Issue Type: Bug
>            Reporter: Prakash Ramachandran
>            Assignee: Prakash Ramachandran
>         Attachments: TEZ-1495.1.patch, TEZ-1495.2.patch, TEZ-1495.3.patch, 
> TEZ-1495.4.patch, TEZ-1495.5.patch, TEZ-1495.6.patch, TEZ-1495.WIP.1.patch, 
> tez-1495-branch0-5.1.patch
>
>
> Tez client should automatically redirect to ATS when the AM is not running.
> All APIs exposed ( DAG status, counters, etc ) from the DAGClient should 
> continue to work after the AM has shut down.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to