[
https://issues.apache.org/jira/browse/TEZ-4720?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ryu Kobayashi updated TEZ-4720:
-------------------------------
Affects Version/s: 0.10.5
> DagAwareYarnTaskScheduler.getAvailableResources() should clamp negative
> resource values to 0
> --------------------------------------------------------------------------------------------
>
> Key: TEZ-4720
> URL: https://issues.apache.org/jira/browse/TEZ-4720
> Project: Apache Tez
> Issue Type: Bug
> Affects Versions: 0.10.5
> Reporter: Ryu Kobayashi
> Priority: Major
>
> When YARN Resource Manager temporarily reports negative available resources,
> DagAwareYarnTaskScheduler.getAvailableResources() returns the negative value
> as-is. This negative value is then used to initialize totalResources in
> getProgress() on the first heartbeat. Once set to a negative value,
> totalResources.getMemory() != 0 so it is never updated again, causing all
> subsequent resource calculations to be incorrect.
> getAvailableResources() returns the raw value from
> client.getAvailableResources() without any validation. There is no guard
> against negative values, unlike Resource.castToIntSafely() in Hadoop which
> was fixed in YARN-11964 to clamp negative values to 0.
> The Hadoop-side root cause has been resolved in YARN-11964.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)