[ 
https://issues.apache.org/jira/browse/TEZ-1967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14325037#comment-14325037
 ] 

Siddharth Seth commented on TEZ-1967:
-------------------------------------

[~vasanthkumar] - apologies, I missed the update a week ago.
Just took a look at the updated patch. There's several bits that need fixing.

- The javadoc on the new API should read : Get the status of the specified DAG 
when it reaches a final state, or the timeout expires.
- The API should be marked unstable for now. Some follow up jiras will likely 
be needed; after which it can be moved to stable.
- Timeout handling - in the client and in the AM: This needs to be capped by 
the timeout specified by the user / the timeout in the AM 
(Thread.sleep(pollInterval); for instance)
- The implementation of the new API in DAGClient needs some changes.
{code} if (dagStatus.getState() == DAGStatus.State.RUNNING
            || dagStatus.getState() == DAGStatus.State.SUCCEEDED
            || dagStatus.getState() == DAGStatus.State.FAILED
            || dagStatus.getState() == DAGStatus.State.KILLED
            || dagStatus.getState() == DAGStatus.State.ERROR) {
{code}
is followed by another lookup to getDAGStatusViaAM. This should differentiate 
between RUNNING / final states. RUNNING would send the left over timeout to the 
AM; a final state would return immediately / get the state once more in case 
the previous state was returned by the RM (for now, just query once more with a 
timeout of 0 like it's being done in the patch).

{code}long pollInterval = conf.getLong(
        TezConfiguration.TEZ_DAG_STATUS_POLLINTERVAL_MS,
        TezConfiguration.TEZ_DAG_STATUS_POLLINTERVAL_MS_DEFAULT);
{code}
This shouldn't be fetched form configuration on each invocation of the API


For future jiras: 
- The sleep within the AM can be improved via monitors, but that can be done in 
a follow up jira.
- INITED state is returned when communicating with the AM, SUBMITTED state is 
returned when communicating with the RM. That could be used to optimize the 
flow.

> Add a monitoring API on DAGClient which returns after a time interval or on 
> DAG state change
> --------------------------------------------------------------------------------------------
>
>                 Key: TEZ-1967
>                 URL: https://issues.apache.org/jira/browse/TEZ-1967
>             Project: Apache Tez
>          Issue Type: Improvement
>    Affects Versions: 0.7.0
>            Reporter: Siddharth Seth
>            Assignee: Vasanth kumar RJ
>             Fix For: 0.7.0
>
>         Attachments: TEZ-1967-InitialReview.patch, TEZ-1967.1.patch, 
> TEZ-1967.2.patch, TEZ-1967.3.patch
>
>
> To monitor a running DAG, clients end up using DAGClient.getDAGSstatus in a 
> loop with a poll interval.
> In the worst case, they find out about DAG completion, failure etc only after 
> the poll interval.
> Instead, an API can be added which waits on the AM for a specified interval, 
> but can return earlier if the DAG state changes.
> This will end up blocking RPC handlers - but that isn't a problem since we 
> don't have many entities querying for DAG status.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to