heguanhui opened a new pull request, #63314:
URL: https://github.com/apache/doris/pull/63314
## Summary
- Fix deduplication bug in `ADMIN CLEAN TRASH` command where duplicate clean
trash tasks could be sent to the same BE
- Add `Set<Long>` dedup by backend ID in both `getNeedCleanedBackends()` and
`cleanTrash()` methods
- Add unit test `AdminCleanTrashCommandTest` covering duplicate queries,
distinct queries, and clean-all scenarios
## What problem does this PR solve?
Issue Number: close #xxx
Problem Summary: The `ADMIN CLEAN TRASH` command could send duplicate clean
trash tasks to the same BE. When `backendsQuery` contains duplicate entries
(e.g., same `host:port` specified twice), or when different string
representations resolve to the same backend, `getNeedCleanedBackends()` would
add the same backend multiple times. Additionally, `cleanTrash()` did not
deduplicate by backend ID before creating tasks, so duplicate backends in the
list would result in multiple `CleanTrashTask` objects being sent to the same
BE.
### Root Cause
1. In `getNeedCleanedBackends()`, the `backendsID.remove(backendQuery)` only
deduplicates by the string key format (`host:port`). If the user provides the
same backend via different string representations or duplicate entries, the
same backend could be added multiple times.
2. In `cleanTrash()`, there was no deduplication by backend ID. The method
iterated over the `backends` list and created a `CleanTrashTask` for each entry
without checking for duplicates.
3. `CleanTrashTask` has signature `-1` and is NOT added to `AgentTaskQueue`
(only `AgentTaskExecutor.submit(batchTask)` is called), so the queue dedup
mechanism does not apply.
### Fix
- Add `Set<Long> addedIds` in `getNeedCleanedBackends()` to deduplicate by
backend ID when processing `backendsQuery`
- Add `Set<Long> addedBackendIds` in `cleanTrash()` as a safety net to
deduplicate by backend ID before creating tasks
## Release note
Fixed a bug where ADMIN CLEAN TRASH could send duplicate clean tasks to the
same backend when duplicate backend addresses were specified.
## Check List (For Author)
- Test: Unit Test
- `AdminCleanTrashCommandTest` verifies dedup with duplicate queries,
distinct queries, and clean-all scenarios
- Behavior changed: No
- Does this need documentation: No
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]