[ 
https://issues.apache.org/jira/browse/TEZ-4720?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ryu Kobayashi updated TEZ-4720:
-------------------------------
    Affects Version/s: 0.10.5

> DagAwareYarnTaskScheduler.getAvailableResources() should clamp negative 
> resource values to 0
> --------------------------------------------------------------------------------------------
>
>                 Key: TEZ-4720
>                 URL: https://issues.apache.org/jira/browse/TEZ-4720
>             Project: Apache Tez
>          Issue Type: Bug
>    Affects Versions: 0.10.5
>            Reporter: Ryu Kobayashi
>            Priority: Major
>
> When YARN Resource Manager temporarily reports negative available resources, 
> DagAwareYarnTaskScheduler.getAvailableResources() returns the negative value 
> as-is. This negative value is then used to initialize totalResources in 
> getProgress() on the first heartbeat. Once set to a negative value, 
> totalResources.getMemory() != 0 so it is never updated again, causing all 
> subsequent resource calculations to be incorrect.
> getAvailableResources() returns the raw value from 
> client.getAvailableResources() without any validation. There is no guard 
> against negative values, unlike Resource.castToIntSafely() in Hadoop which 
> was fixed in YARN-11964 to clamp negative values to 0.
> The Hadoop-side root cause has been resolved in YARN-11964.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to