[ 
https://issues.apache.org/jira/browse/TEZ-1967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14306095#comment-14306095
 ] 

Siddharth Seth commented on TEZ-1967:
-------------------------------------

I think it'll be better for the API to be simpler.
Instead of 
{code}
public abstract DAGStatus getDAGStatus(@Nullable Set<StatusGetOpts> 
statusOptions,
    @Nullable DAGStatus clientDAGStatus, long timeout, long pollInterval)
{code}
the following will be better
{code}
public abstract DAGStatus getDAGStatus(@Nullable Set<StatusGetOpts> 
statusOptions, long timeout)
{code}

The most common use case would be monitoring for completeion; so instead of a 
state change, this returns at timeout or when the DAG enters a final state. An 
alternate would be on any state change - the implementation should not be too 
different in either case. I think the first is more useful.

In terms of the poll interval - I believe that's primarily required if the App 
is not in RUNNING state. We should just hardcode that for now; this can be set 
up as an advanced configuration parameter at a later point if we want it to be 
configurable.

On the patch itself.
I'm a little confused on what the checks on dagStatus in the client are doing. 
Shouldn't this be sleeping till the RM state moves to running - after which, 
the leftover time is sent to the AM.
Within the AM itself, there should be a wait - which is notified when the DAG 
reaches a final state.


> Add a monitoring API on DAGClient which returns after a time interval or on 
> DAG state change
> --------------------------------------------------------------------------------------------
>
>                 Key: TEZ-1967
>                 URL: https://issues.apache.org/jira/browse/TEZ-1967
>             Project: Apache Tez
>          Issue Type: Improvement
>    Affects Versions: 0.7.0
>            Reporter: Siddharth Seth
>            Assignee: Vasanth kumar RJ
>             Fix For: 0.7.0
>
>         Attachments: TEZ-1967-InitialReview.patch, TEZ-1967.1.patch
>
>
> To monitor a running DAG, clients end up using DAGClient.getDAGSstatus in a 
> loop with a poll interval.
> In the worst case, they find out about DAG completion, failure etc only after 
> the poll interval.
> Instead, an API can be added which waits on the AM for a specified interval, 
> but can return earlier if the DAG state changes.
> This will end up blocking RPC handlers - but that isn't a problem since we 
> don't have many entities querying for DAG status.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to