[
https://issues.apache.org/jira/browse/TEZ-1967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14306095#comment-14306095
]
Siddharth Seth commented on TEZ-1967:
-------------------------------------
I think it'll be better for the API to be simpler.
Instead of
{code}
public abstract DAGStatus getDAGStatus(@Nullable Set<StatusGetOpts>
statusOptions,
@Nullable DAGStatus clientDAGStatus, long timeout, long pollInterval)
{code}
the following will be better
{code}
public abstract DAGStatus getDAGStatus(@Nullable Set<StatusGetOpts>
statusOptions, long timeout)
{code}
The most common use case would be monitoring for completeion; so instead of a
state change, this returns at timeout or when the DAG enters a final state. An
alternate would be on any state change - the implementation should not be too
different in either case. I think the first is more useful.
In terms of the poll interval - I believe that's primarily required if the App
is not in RUNNING state. We should just hardcode that for now; this can be set
up as an advanced configuration parameter at a later point if we want it to be
configurable.
On the patch itself.
I'm a little confused on what the checks on dagStatus in the client are doing.
Shouldn't this be sleeping till the RM state moves to running - after which,
the leftover time is sent to the AM.
Within the AM itself, there should be a wait - which is notified when the DAG
reaches a final state.
> Add a monitoring API on DAGClient which returns after a time interval or on
> DAG state change
> --------------------------------------------------------------------------------------------
>
> Key: TEZ-1967
> URL: https://issues.apache.org/jira/browse/TEZ-1967
> Project: Apache Tez
> Issue Type: Improvement
> Affects Versions: 0.7.0
> Reporter: Siddharth Seth
> Assignee: Vasanth kumar RJ
> Fix For: 0.7.0
>
> Attachments: TEZ-1967-InitialReview.patch, TEZ-1967.1.patch
>
>
> To monitor a running DAG, clients end up using DAGClient.getDAGSstatus in a
> loop with a poll interval.
> In the worst case, they find out about DAG completion, failure etc only after
> the poll interval.
> Instead, an API can be added which waits on the AM for a specified interval,
> but can return earlier if the DAG state changes.
> This will end up blocking RPC handlers - but that isn't a problem since we
> don't have many entities querying for DAG status.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)