[ 
https://issues.apache.org/jira/browse/TEZ-3318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15358739#comment-15358739
 ] 

Sreenath Somarajapuram commented on TEZ-3318:
---------------------------------------------

Yes.

Following is the polling-retry logic:
1. On loading a record, we also load the related application details from RM > 
AHS (Used for status correction).
2. If the application status is not complete (SUCCEEDED, FINISHED, FAILED, 
KILLED), we start polling.
3. Function hooked in the polling service will try to fetch data from AM.
- 3.1 If it succeed, models would be updated and polling continue (A poll would 
happen 3 seconds after the previous poll returns).
- 3.2 If any of the poll fail, onPollFailure hook will be called. It checks 
with RM if the application is complete.
-- 3.2.1 If application is complete, stop polling and schedule a reload after 6 
seconds. So after 6 seconds things start from #1.
-- 3.2.2 If application is not complete as per RM, show error bar that AM is 
not reachable  and continue polling AM.
-- 3.2.3 If RM is not reachable, show error bar about RM and schedule a reload 
after 6 seconds. So after 6 seconds things start from #1.

In other words, the retry logic doesn't uses the polling service (The interval 
is not reset). But organically gets into a similar pattern when needed.

> Tez UI: Polling is not restarted after RM recovery
> --------------------------------------------------
>
>                 Key: TEZ-3318
>                 URL: https://issues.apache.org/jira/browse/TEZ-3318
>             Project: Apache Tez
>          Issue Type: Bug
>            Reporter: Sreenath Somarajapuram
>            Assignee: Sreenath Somarajapuram
>         Attachments: TEZ-3318.1.patch
>
>
> For a running DAG, we poll the AM to get progress and other realtime 
> information. This communication happens via RM. If RM goes down, even after 
> its recovery the polling is not re established.
> Step to repro:
> 1. Run a job
> 2. Go to DAG details page, and ensure that the progress is getting updated.
> 3. Stop RM, and ensure that error bar is getting displayed in the UI.
> 4. Start RM.
> 5. As soon as RM is online, the progress bar must get updated.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to