heguanhui opened a new pull request, #63314:
URL: https://github.com/apache/doris/pull/63314

   ## Summary
   - Fix deduplication bug in `ADMIN CLEAN TRASH` command where duplicate clean 
trash tasks could be sent to the same BE
   - Add `Set<Long>` dedup by backend ID in both `getNeedCleanedBackends()` and 
`cleanTrash()` methods
   - Add unit test `AdminCleanTrashCommandTest` covering duplicate queries, 
distinct queries, and clean-all scenarios
   
   ## What problem does this PR solve?
   
   Issue Number: close #xxx
   
   Problem Summary: The `ADMIN CLEAN TRASH` command could send duplicate clean 
trash tasks to the same BE. When `backendsQuery` contains duplicate entries 
(e.g., same `host:port` specified twice), or when different string 
representations resolve to the same backend, `getNeedCleanedBackends()` would 
add the same backend multiple times. Additionally, `cleanTrash()` did not 
deduplicate by backend ID before creating tasks, so duplicate backends in the 
list would result in multiple `CleanTrashTask` objects being sent to the same 
BE.
   
   ### Root Cause
   
   1. In `getNeedCleanedBackends()`, the `backendsID.remove(backendQuery)` only 
deduplicates by the string key format (`host:port`). If the user provides the 
same backend via different string representations or duplicate entries, the 
same backend could be added multiple times.
   
   2. In `cleanTrash()`, there was no deduplication by backend ID. The method 
iterated over the `backends` list and created a `CleanTrashTask` for each entry 
without checking for duplicates.
   
   3. `CleanTrashTask` has signature `-1` and is NOT added to `AgentTaskQueue` 
(only `AgentTaskExecutor.submit(batchTask)` is called), so the queue dedup 
mechanism does not apply.
   
   ### Fix
   
   - Add `Set<Long> addedIds` in `getNeedCleanedBackends()` to deduplicate by 
backend ID when processing `backendsQuery`
   - Add `Set<Long> addedBackendIds` in `cleanTrash()` as a safety net to 
deduplicate by backend ID before creating tasks
   
   ## Release note
   
   Fixed a bug where ADMIN CLEAN TRASH could send duplicate clean tasks to the 
same backend when duplicate backend addresses were specified.
   
   ## Check List (For Author)
   
   - Test: Unit Test
       - `AdminCleanTrashCommandTest` verifies dedup with duplicate queries, 
distinct queries, and clean-all scenarios
   - Behavior changed: No
   - Does this need documentation: No


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to