[ 
https://issues.apache.org/jira/browse/TEZ-4349?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

László Bodor resolved TEZ-4349.
-------------------------------
    Resolution: Fixed

> DAGClient gets stuck with invalid cached DAGStatus
> --------------------------------------------------
>
>                 Key: TEZ-4349
>                 URL: https://issues.apache.org/jira/browse/TEZ-4349
>             Project: Apache Tez
>          Issue Type: Bug
>            Reporter: Ahmed Hussein
>            Assignee: Ahmed Hussein
>            Priority: Major
>             Fix For: 0.10.2
>
>          Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> I found that some Oozie launchers get stuck waiting for the job to complete.
> After investigation I found that {{dagClient.getDAGStatus(null)}} calls the 
> override {{dagClient.getDAGStatus(null, 0)}} , which then calls 
> {{getDAGStatusInternal}} making use of the cachedDagStatus field.
> The cachedDagStatus is never updated causing the launcher to wait 
> indefinitely.
>  
> [https://github.com/apache/tez/blob/master/tez-api/src/main/java/org/apache/tez/dag/api/client/DAGClientImpl.java#L212]
> {code:java}
>       if (!dagCompleted) {
>         if (dagStatus != null) {
>           cachedDagStatus = dagStatus;
>           return dagStatus;
>         }
>         if (cachedDagStatus != null) {
>           // could not get from AM (not reachable/ was killed). return cached 
> status.
>           return cachedDagStatus;
>         }
>       }
> {code}
> +To Fix:+
>  The {{cachedDagStatus}} should be valid for a certain amount of time, or 
> certain number of retires.
> When the cachedDAGStatus expires, the DAGClient tries to pull from AM or the 
> RM.
> An error in fetching the status from both AM and RM, would return null to the 
> caller.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to