[
https://issues.apache.org/jira/browse/TEZ-3318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15358739#comment-15358739
]
Sreenath Somarajapuram commented on TEZ-3318:
---------------------------------------------
Yes.
Following is the polling-retry logic:
1. On loading a record, we also load the related application details from RM >
AHS (Used for status correction).
2. If the application status is not complete (SUCCEEDED, FINISHED, FAILED,
KILLED), we start polling.
3. Function hooked in the polling service will try to fetch data from AM.
- 3.1 If it succeed, models would be updated and polling continue (A poll would
happen 3 seconds after the previous poll returns).
- 3.2 If any of the poll fail, onPollFailure hook will be called. It checks
with RM if the application is complete.
-- 3.2.1 If application is complete, stop polling and schedule a reload after 6
seconds. So after 6 seconds things start from #1.
-- 3.2.2 If application is not complete as per RM, show error bar that AM is
not reachable and continue polling AM.
-- 3.2.3 If RM is not reachable, show error bar about RM and schedule a reload
after 6 seconds. So after 6 seconds things start from #1.
In other words, the retry logic doesn't uses the polling service (The interval
is not reset). But organically gets into a similar pattern when needed.
> Tez UI: Polling is not restarted after RM recovery
> --------------------------------------------------
>
> Key: TEZ-3318
> URL: https://issues.apache.org/jira/browse/TEZ-3318
> Project: Apache Tez
> Issue Type: Bug
> Reporter: Sreenath Somarajapuram
> Assignee: Sreenath Somarajapuram
> Attachments: TEZ-3318.1.patch
>
>
> For a running DAG, we poll the AM to get progress and other realtime
> information. This communication happens via RM. If RM goes down, even after
> its recovery the polling is not re established.
> Step to repro:
> 1. Run a job
> 2. Go to DAG details page, and ensure that the progress is getting updated.
> 3. Stop RM, and ensure that error bar is getting displayed in the UI.
> 4. Start RM.
> 5. As soon as RM is online, the progress bar must get updated.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)