Dennis-Mircea Ciupitu created FLINK-39404:
---------------------------------------------
Summary: HardwareDescription reports incorrect CPU cores in
containerized environments with fractional CPU limits
Key: FLINK-39404
URL: https://issues.apache.org/jira/browse/FLINK-39404
Project: Flink
Issue Type: Bug
Components: API / Core, Runtime / Metrics, Runtime / Task
Reporter: Dennis-Mircea Ciupitu
Fix For: 2.3.0
When Flink runs inside a container with a fractional CPU limit (e.g. 0.5
cores), the Web UI and REST API report an incorrect number of CPU cores. A
TaskManager limited to 0.5 CPU cores is displayed as having 1 CPU core.
{{}}
{{Hardware.getNumberCPUCores()}} relies on
{{{}Runtime.getRuntime().availableProcessors(){}}}, which returns an int. In
containerized environments (Kubernetes, YARN), this value is the ceiling of the
container’s CPU limit, so a limit of {{0.5}} cores is reported as {{{}1{}}},
and {{1.5}} cores is reported as {{{}2{}}}.
Additionally, {{HardwareDescription.numberOfCPUCores}} is typed as {{{}int{}}},
which fundamentally cannot represent fractional CPU allocations.
This causes two problems:
# {*}Inaccurate Web UI display{*}: The Task Manager list and detail pages show
rounded-up CPU cores (e.g. "1" instead of "0.5"), misleading operators about
actual resource allocation.
# {*}Thread pool over-provisioning{*}:
{{ClusterEntrypointUtils.getPoolSize()}} computes {{{}4 *
Hardware.getNumberCPUCores(){}}}. With {{0.5}} CPU, this creates {{4}} threads
({{{}4 * ceil(0.5) = 4 * 1 = 4{}}}) instead of the correct 2 threads
({{{}ceil(4 * 0.5) = 2{}}}).
h3. Steps to Reproduce
# Deploy a Flink cluster on Kubernetes with
{{kubernetes.taskmanager.cpu.amount: 0.5}}
# Open the Flink Web UI → Task Managers
# Observe that "CPU Cores" shows *1* instead of *0.5*
h3. Expected Behavior
* The Task Manager Web UI should display *0.5* CPU cores
* Thread pool sizing should use the fractional value before ceiling: {{ceil(4
* 0.5) = 2}} threads
h3. Actual Behavior
* The Web UI displays *1* CPU core
* Thread pool sizing over-provisions: {{4 * ceil(0.5) = 4}} threads
--
This message was sent by Atlassian Jira
(v8.20.10#820010)