Baymine opened a new pull request, #64030:
URL: https://github.com/apache/doris/pull/64030
Adds BE resource usage reporting (CPU and memory) to FE, with k8s-aware CPU
core collection via CpuInfo::num_cores().
### What problem does this PR solve?
Issue Number: close #xxx
Related PR: #xxx
Problem Summary:
Currently there is no way to monitor BE node CPU and memory usage from FE.
The `SHOW BACKENDS` command and `backends()` table
function do not expose real-time resource utilization metrics, making it
difficult to assess cluster resource health and plan
capacity.
### Release note
Doris now supports reporting BE resource usage (CPU usage and memory
usage) to FE. Two new columns `CpuUsedPct` and `MemUsedPct` are
added to `SHOW BACKENDS` and the `backends()` table function. CPU core
count is collected in a k8s-aware manner via
`CpuInfo::num_cores()`.
<img width="956" height="567" alt="image"
src="https://github.com/user-attachments/assets/cdc5d8ed-eb5a-4e92-99d2-07199e8a1ff0"
/>
### Check List (For Author)
- Test <!-- At least one of them must be included. -->
- [ ] Regression test
- [ ] Unit Test
- [x] Manual test (add detailed scripts or steps below)
- Deploy BE and FE, verify `SHOW BACKENDS` output includes
`CpuUsedPct` and `MemUsedPct` columns with valid percentage
values.
- Verify `SELECT * FROM backends()` returns the two new columns
with correct values.
- Verify CPU usage reflects k8s-aware core count when running in a
container with CPU limits set.
- [ ] No need to test or manual test. Explain why:
- [ ] This is a refactor/code format and no logic has been changed.
- [ ] Previous test can cover this change.
- [ ] No code files have been changed.
- [ ] Other reason <!-- Add your reason? -->
- Behavior changed:
- [ ] No.
- [x] Yes. <!-- Explain the behavior change -->
- `SHOW BACKENDS` and `backends()` table function now include two
additional columns: `CpuUsedPct` (CPU usage percentage) and
`MemUsedPct` (memory usage percentage).
- BE now periodically reports resource usage to FE via a new
`REPORT_RESOURCE_USAGE` report worker (default interval: 5
seconds, configurable via `report_resource_usage_interval_seconds`).
- Does this need documentation?
- [ ] No.
- [x] Yes. <!-- Add document PR link here. eg:
https://github.com/apache/doris-website/pull/1214 -->
### Check List (For Reviewer who merge this PR)
- [ ] Confirm the release note
- [ ] Confirm test cases
- [ ] Confirm document
- [ ] Add branch pick label <!-- Add branch pick label that this PR should
merge into -->
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]