Dennis-Mircea Ciupitu created FLINK-39404:
---------------------------------------------

             Summary: HardwareDescription reports incorrect CPU cores in 
containerized environments with fractional CPU limits
                 Key: FLINK-39404
                 URL: https://issues.apache.org/jira/browse/FLINK-39404
             Project: Flink
          Issue Type: Bug
          Components: API / Core, Runtime / Metrics, Runtime / Task
            Reporter: Dennis-Mircea Ciupitu
             Fix For: 2.3.0


When Flink runs inside a container with a fractional CPU limit (e.g. 0.5 
cores), the Web UI and REST API report an incorrect number of CPU cores. A 
TaskManager limited to 0.5 CPU cores is displayed as having 1 CPU core.

{{}}

{{Hardware.getNumberCPUCores()}} relies on 
{{{}Runtime.getRuntime().availableProcessors(){}}}, which returns an int. In 
containerized environments (Kubernetes, YARN), this value is the ceiling of the 
container’s CPU limit, so a limit of {{0.5}} cores is reported as {{{}1{}}}, 
and {{1.5}} cores is reported as {{{}2{}}}.

Additionally, {{HardwareDescription.numberOfCPUCores}} is typed as {{{}int{}}}, 
which fundamentally cannot represent fractional CPU allocations.

This causes two problems:
 # {*}Inaccurate Web UI display{*}: The Task Manager list and detail pages show 
rounded-up CPU cores (e.g. "1" instead of "0.5"), misleading operators about 
actual resource allocation.

 # {*}Thread pool over-provisioning{*}: 
{{ClusterEntrypointUtils.getPoolSize()}} computes {{{}4 * 
Hardware.getNumberCPUCores(){}}}. With {{0.5}} CPU, this creates {{4}} threads 
({{{}4 * ceil(0.5) = 4 * 1 = 4{}}}) instead of the correct 2 threads 
({{{}ceil(4 * 0.5) = 2{}}}).

h3. Steps to Reproduce
 # Deploy a Flink cluster on Kubernetes with 
{{kubernetes.taskmanager.cpu.amount: 0.5}}
 # Open the Flink Web UI → Task Managers
 # Observe that "CPU Cores" shows *1* instead of *0.5*

h3. Expected Behavior
 * The Task Manager Web UI should display *0.5* CPU cores
 * Thread pool sizing should use the fractional value before ceiling: {{ceil(4 
* 0.5) = 2}} threads

h3. Actual Behavior
 * The Web UI displays *1* CPU core
 * Thread pool sizing over-provisions: {{4 * ceil(0.5) = 4}} threads



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to