[ 
https://issues.apache.org/jira/browse/FLINK-22505?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated FLINK-22505:
-----------------------------------
    Labels: pull-request-available  (was: )

> Limit the precision of Resource
> -------------------------------
>
>                 Key: FLINK-22505
>                 URL: https://issues.apache.org/jira/browse/FLINK-22505
>             Project: Flink
>          Issue Type: Bug
>          Components: Runtime / Coordination
>    Affects Versions: 1.13.0
>            Reporter: Yangze Guo
>            Assignee: Yangze Guo
>            Priority: Major
>              Labels: pull-request-available
>
> In our internal deployment, we found that a high precision {{CPUResource}} 
> may cause the required resource never to be fulfilled. Think about the 
> following scenario:
> - The {{SlotManager}} receives a slot request with 1.000000000000001 CPU and 
> decides to allocate a pending task manager with that resource spec.
> - The resource manager starts a task manager and sets the CPU by dynamic 
> config. In this step, we cast the {{CPUResource}} to a double value, where 
> the precision loss happens.
> The task manager will finally register with 1.0 CPU and thus can not deduct 
> any pending task manager or fulfill the slot request.
> To solve that issue, we proposed to limit the precision of Resource to a safe 
> value, e.g. 8, to prevent the precision loss when cast to double.
> - For {{CPUResource}}, the supported scale for the CPU is 3 in k8s while in 
> Yarn, the CPU should be an integer.
> - For {{ExternalResource}}, the value will always be treated as an integer.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to