Yukang-Lian opened a new pull request, #61428:
URL: https://github.com/apache/doris/pull/61428
## Summary
- Add a new system table `information_schema.be_compaction_tasks` that
exposes compaction task metadata across all BEs, covering PENDING, RUNNING,
FINISHED, and FAILED states with 30 columns including identification, timing,
input/output stats, IO stats, and resource usage.
- Introduce `CompactionTaskTracker` singleton to track compaction tasks
across their full lifecycle, integrated at all 7 compaction entry points
(local: base/cumu/full, single-replica, cold-data, manual HTTP; cloud:
base/cumu/full, manual HTTP, index-change).
- Support multi-BE fan-out query via `BackendPartitionedSchemaScanNode`,
fallback records for missed registrations, and proper cleanup on early-return
paths.
## Usage Examples
```sql
-- View all running compaction tasks across all BEs
mysql> SELECT BACKEND_ID, TABLET_ID, COMPACTION_TYPE, STATUS,
ELAPSED_TIME_MS, INPUT_DATA_SIZE
FROM information_schema.be_compaction_tasks
WHERE STATUS = 'RUNNING'
ORDER BY ELAPSED_TIME_MS DESC;
+------------+-----------+-----------------+---------+-----------------+-----------------+
| BACKEND_ID | TABLET_ID | COMPACTION_TYPE | STATUS | ELAPSED_TIME_MS |
INPUT_DATA_SIZE |
+------------+-----------+-----------------+---------+-----------------+-----------------+
| 10001 | 123456 | base | RUNNING | 35210 |
524288000 |
| 10002 | 789012 | cumulative | RUNNING | 1250 |
10485760 |
+------------+-----------+-----------------+---------+-----------------+-----------------+
-- Find the slowest compactions (potential performance issues)
mysql> SELECT TABLET_ID, COMPACTION_TYPE, ELAPSED_TIME_MS, INPUT_DATA_SIZE,
OUTPUT_DATA_SIZE,
PEAK_MEMORY_BYTES, IS_VERTICAL, STATUS_MSG
FROM information_schema.be_compaction_tasks
WHERE STATUS IN ('FINISHED', 'FAILED')
ORDER BY ELAPSED_TIME_MS DESC LIMIT 5;
+-----------+-----------------+-----------------+-----------------+------------------+-------------------+-------------+----------------------------+
| TABLET_ID | COMPACTION_TYPE | ELAPSED_TIME_MS | INPUT_DATA_SIZE |
OUTPUT_DATA_SIZE | PEAK_MEMORY_BYTES | IS_VERTICAL | STATUS_MSG
|
+-----------+-----------------+-----------------+-----------------+------------------+-------------------+-------------+----------------------------+
| 123456 | base | 42000 | 524288000 |
210000000 | 268435456 | 1 | |
| 567890 | full | 8500 | 30000000 |
0 | 33554432 | 0 | [INTERNAL_ERROR]disk full |
+-----------+-----------------+-----------------+-----------------+------------------+-------------------+-------------+----------------------------+
-- Check remote IO ratio (important for disaggregated storage)
mysql> SELECT TABLET_ID, COMPACTION_TYPE, BYTES_READ_FROM_LOCAL,
BYTES_READ_FROM_REMOTE,
ROUND(BYTES_READ_FROM_REMOTE * 100.0 / (BYTES_READ_FROM_LOCAL
+ BYTES_READ_FROM_REMOTE + 1), 2) AS remote_pct
FROM information_schema.be_compaction_tasks
WHERE STATUS = 'FINISHED' AND BYTES_READ_FROM_REMOTE > 0
ORDER BY remote_pct DESC;
+-----------+-----------------+-----------------------+------------------------+------------+
| TABLET_ID | COMPACTION_TYPE | BYTES_READ_FROM_LOCAL |
BYTES_READ_FROM_REMOTE | remote_pct |
+-----------+-----------------+-----------------------+------------------------+------------+
| 234567 | cumulative | 10485760 |
104857600 | 90.91 |
| 345678 | base | 524288000 |
52428800 | 9.09 |
+-----------+-----------------+-----------------------+------------------------+------------+
```
### Full Schema (30 columns)
```
BACKEND_ID, COMPACTION_ID, TABLE_ID, PARTITION_ID, TABLET_ID,
COMPACTION_TYPE, STATUS, TRIGGER_METHOD, COMPACTION_SCORE,
SCHEDULED_TIME, START_TIME, END_TIME, ELAPSED_TIME_MS,
INPUT_ROWSETS_COUNT, INPUT_ROW_NUM, INPUT_DATA_SIZE, INPUT_SEGMENTS_NUM,
INPUT_VERSION_RANGE,
MERGED_ROWS, FILTERED_ROWS, OUTPUT_ROW_NUM, OUTPUT_DATA_SIZE,
OUTPUT_SEGMENTS_NUM, OUTPUT_VERSION,
BYTES_READ_FROM_LOCAL, BYTES_READ_FROM_REMOTE, PEAK_MEMORY_BYTES,
IS_VERTICAL, PERMITS, STATUS_MSG
```
## Test plan
- [x] BE unit tests: 14 cases covering full lifecycle, failure paths,
fallback records, concurrent safety, config changes, input_version_range
backfill
- [x] Regression test: end-to-end SQL query validation with manual
compaction trigger, field verification, filtering
closes #48893
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]