[
https://issues.apache.org/jira/browse/TEZ-4349?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
László Bodor resolved TEZ-4349.
-------------------------------
Resolution: Fixed
> DAGClient gets stuck with invalid cached DAGStatus
> --------------------------------------------------
>
> Key: TEZ-4349
> URL: https://issues.apache.org/jira/browse/TEZ-4349
> Project: Apache Tez
> Issue Type: Bug
> Reporter: Ahmed Hussein
> Assignee: Ahmed Hussein
> Priority: Major
> Fix For: 0.10.2
>
> Time Spent: 2h 20m
> Remaining Estimate: 0h
>
> I found that some Oozie launchers get stuck waiting for the job to complete.
> After investigation I found that {{dagClient.getDAGStatus(null)}} calls the
> override {{dagClient.getDAGStatus(null, 0)}} , which then calls
> {{getDAGStatusInternal}} making use of the cachedDagStatus field.
> The cachedDagStatus is never updated causing the launcher to wait
> indefinitely.
>
> [https://github.com/apache/tez/blob/master/tez-api/src/main/java/org/apache/tez/dag/api/client/DAGClientImpl.java#L212]
> {code:java}
> if (!dagCompleted) {
> if (dagStatus != null) {
> cachedDagStatus = dagStatus;
> return dagStatus;
> }
> if (cachedDagStatus != null) {
> // could not get from AM (not reachable/ was killed). return cached
> status.
> return cachedDagStatus;
> }
> }
> {code}
> +To Fix:+
> The {{cachedDagStatus}} should be valid for a certain amount of time, or
> certain number of retires.
> When the cachedDAGStatus expires, the DAGClient tries to pull from AM or the
> RM.
> An error in fetching the status from both AM and RM, would return null to the
> caller.
--
This message was sent by Atlassian Jira
(v8.20.1#820001)