Yangze Guo created FLINK-22505:
----------------------------------
Summary: Limit the precision of Resource
Key: FLINK-22505
URL: https://issues.apache.org/jira/browse/FLINK-22505
Project: Flink
Issue Type: Bug
Components: Runtime / Coordination
Affects Versions: 1.13.0
Reporter: Yangze Guo
In our internal deployment, we found that a high precision {{CPUResource}} may
cause the required resource never to be fulfilled. Think about the following
scenario:
- The {{SlotManager}} receives a slot request with 1.000000000000001 CPU and
decides to allocate a pending task manager with that resource spec.
- The resource manager starts a task manager and sets the CPU by dynamic
config. In this step, we cast the {{CPUResource}} to a double value, where the
precision loss happens.
The task manager will finally register with 1.0 CPU and thus can not deduct any
pending task manager or fulfill the slot request.
To solve that issue, we proposed to limit the precision of Resource to a safe
value, e.g. 8, to prevent the precision loss when cast to double.
- For {{CPUResource}}, the supported scale for the CPU is 3 in k8s while in
Yarn, the CPU should be an integer.
- For {{ExternalResource}}, the value will always be treated as an integer.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)