[ 
https://issues.apache.org/jira/browse/TEZ-1967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14283363#comment-14283363
 ] 

Siddharth Seth commented on TEZ-1967:
-------------------------------------

[~vasanthkumar] - thanks for taking this up.

Looked at the patch, it seems to be polling from the client itself.

The intent of the jira is to delay the response from the AM - instead of 
polling the interval constantly or at a specific time interval. Essentially, 
getDagStatus(..., long timeout) will send a request to the AM immediately. It 
then waits in the AM for either the timeout, or for the status of the DAG to 
change - at which point it replies. Trying to avoid the polling from the 
clientside. The main changes would be in 
DAGClientAMProtocolBlockingPBServerImpl, DAGClientHandler and DAGAppMaster, 
along with the DAGClient and DAGClientImpl ofcourse for the new API. 

> Add a monitoring API on DAGClient which returns after a time interval or on 
> DAG state change
> --------------------------------------------------------------------------------------------
>
>                 Key: TEZ-1967
>                 URL: https://issues.apache.org/jira/browse/TEZ-1967
>             Project: Apache Tez
>          Issue Type: Improvement
>    Affects Versions: 0.7.0
>            Reporter: Siddharth Seth
>            Assignee: Vasanth kumar RJ
>             Fix For: 0.7.0
>
>         Attachments: TEZ-1967.1.patch
>
>
> To monitor a running DAG, clients end up using DAGClient.getDAGSstatus in a 
> loop with a poll interval.
> In the worst case, they find out about DAG completion, failure etc only after 
> the poll interval.
> Instead, an API can be added which waits on the AM for a specified interval, 
> but can return earlier if the DAG state changes.
> This will end up blocking RPC handlers - but that isn't a problem since we 
> don't have many entities querying for DAG status.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to