[
https://issues.apache.org/jira/browse/FLINK-22505?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
ASF GitHub Bot updated FLINK-22505:
-----------------------------------
Labels: pull-request-available (was: )
> Limit the precision of Resource
> -------------------------------
>
> Key: FLINK-22505
> URL: https://issues.apache.org/jira/browse/FLINK-22505
> Project: Flink
> Issue Type: Bug
> Components: Runtime / Coordination
> Affects Versions: 1.13.0
> Reporter: Yangze Guo
> Assignee: Yangze Guo
> Priority: Major
> Labels: pull-request-available
>
> In our internal deployment, we found that a high precision {{CPUResource}}
> may cause the required resource never to be fulfilled. Think about the
> following scenario:
> - The {{SlotManager}} receives a slot request with 1.000000000000001 CPU and
> decides to allocate a pending task manager with that resource spec.
> - The resource manager starts a task manager and sets the CPU by dynamic
> config. In this step, we cast the {{CPUResource}} to a double value, where
> the precision loss happens.
> The task manager will finally register with 1.0 CPU and thus can not deduct
> any pending task manager or fulfill the slot request.
> To solve that issue, we proposed to limit the precision of Resource to a safe
> value, e.g. 8, to prevent the precision loss when cast to double.
> - For {{CPUResource}}, the supported scale for the CPU is 3 in k8s while in
> Yarn, the CPU should be an integer.
> - For {{ExternalResource}}, the value will always be treated as an integer.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)