Liu created FLINK-38961:
---------------------------

             Summary: Display process metrics (CPU, Memory, I/O) on TaskManager 
Web UI
                 Key: FLINK-38961
                 URL: https://issues.apache.org/jira/browse/FLINK-38961
             Project: Flink
          Issue Type: Improvement
          Components: Runtime / Web Frontend
            Reporter: Liu


### Summary

Add a new "Process Usage" panel on the TaskManager Metrics page of the Flink 
Web UI to display process-level metrics, including CPU usage, memory (RSS), and 
I/O statistics.

### Motivation

Currently, the TaskManager Metrics page in Flink Web UI only displays JVM and 
Flink-managed memory metrics. However, users often need to monitor 
process-level resource consumption to better understand the actual resource 
usage of TaskManagers. 

When `metrics.system-resource` is enabled, Flink collects process-level metrics 
such as:
- `Process.CPU.Usage` - CPU usage percentage of the process
- `Process.Memory.RSS` - Resident Set Size (physical memory used by the process)
- `Process.IO.Read` / `Process.IO.Write` - I/O read and write bytes

These metrics are already available through the REST API but are not displayed 
in the Web UI, making it inconvenient for users to monitor them.

### Proposed Changes

1. **Add a "Process Usage" card** on the TaskManager Metrics page 
(`task-manager-metrics.component.html`) displaying:
   - **CPU**: Process CPU usage percentage
   - **Memory**: Process RSS (Resident Set Size)
   - **I/O**: Combined read and write I/O bytes

2. **Extend the metrics query** in `task-manager-metrics.component.ts` to 
include:
   - `Process.CPU.Usage`
   - `Process.Memory.RSS`
   - `Process.IO.Read`
   - `Process.IO.Write`

### Prerequisites

Users need to enable system resource metrics by setting 
`metrics.system-resource: true` in the Flink configuration (it is disabled by 
default). If this option is not enabled, the process metrics will show as 
empty/zero.

### UI Mockup

The new "Process Usage" panel will be placed at the top of the TaskManager 
Metrics page, showing three columns:
- CPU (percentage with 6 decimal precision)
- Memory (humanized bytes format)
- I/O (sum of read and write bytes, humanized)

### Related Documentation

- [System Resource 
Metrics](https://nightlies.apache.org/flink/flink-docs-stable/docs/ops/metrics/#system-resources)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to