deardeng opened a new pull request, #60480:
URL: https://github.com/apache/doris/pull/60480
## Proposed changes
### Problem
During cloud tablet decommission, some tablets take unexpectedly long time
(5+ minutes) to migrate because FE keeps waiting for warmup tasks to complete,
even though the tasks have already failed on BE side.
**Root cause**: In `FileCacheBlockDownloader::download_file_cache_block()`,
when early return occurs (e.g., tablet not found, rowset not found, storage
resource error), the `_inflight_tablets` count is not decremented. This causes:
1. `check_download_task()` always returns `done=false` for these tablets
2. FE's `checkInflightWarmUpCacheAsync()` waits until timeout (default 300
seconds)
3. Tablet migration is blocked unnecessarily
**Example log showing the issue**:
```
W download_file_cache_block: tablet_id=1769675033824 rowset_id not found,
rowset_id=020000000010fa85...
```
After this warning, the tablet's inflight count remains in
`_inflight_tablets` map, causing the 5-minute wait before FE times out and
proceeds.
### Solution
1. Extract the inflight count decrement logic into a reusable lambda
`decrease_inflight_count`
2. Call `decrease_inflight_count()` in all early return paths:
- When `get_tablet()` fails
- When `rowset_id` is not found
- When `remote_storage_resource()` fails
3. Refactor `download_done` callback to reuse `decrease_inflight_count`,
eliminating code duplication
4. Use value capture for `decrease_inflight_count` in `download_done` lambda
to ensure lifetime safety if the callback is ever called asynchronously in the
future
5. Add unit tests to verify inflight count is correctly decremented on
failures
## Further comments
This bug also causes a minor memory leak: entries in `_inflight_tablets` map
are never cleaned up when warmup fails, slowly accumulating over time (cleared
on BE restart).
## Checklist(Required)
1. Does it affect the original behavior:
- [ ] Yes
- [x] No
2. Has unit tests been added:
- [x] Yes
- [ ] No
3. Has document been added or modified:
- [ ] Yes
- [x] No
4. Does it need to update dependencies:
- [ ] Yes
- [x] No
5. Is there any sharding changes:
- [ ] Yes
- [x] No
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]