heguanhui opened a new pull request, #64190:
URL: https://github.com/apache/doris/pull/64190
### What problem does this PR solve?
Issue Number: close #64189
Problem Summary:
`BlockFileCacheTest` has two types of flaky failures, both are test defects,
not business code defects.
**Type 1: Background thread interference** (affects `ttl_modify`,
`io_error`, and all tests calling `test_file_cache`)
The `test_file_cache()` helper creates a `BlockFileCache` that starts
background threads (evict_in_advance at 1000ms, gc at 100ms, block_lru_update
at 5000ms, monitor at 5000ms). These threads asynchronously modify cache state
between test assertions, causing:
- `ttl_modify` failure: `file_block->state()` returns EMPTY instead of
SKIP_CACHE. The background evict_in_advance thread evicts releasable DOWNLOADED
blocks, freeing space so that `try_reserve()` unexpectedly succeeds, keeping
the state as EMPTY instead of transitioning to SKIP_CACHE.
```
be/test/io/cache/block_file_cache_test.cpp:447: Failure
Expected: file_block->state() == io::FileBlock::State::SKIP_CACHE
Actual: EMPTY == SKIP_CACHE
```
- `io_error` failure: `mgr.get_file_blocks_num()` returns 10 instead of 9.
When a `FileBlocksHolder` destructor tries to remove EMPTY blocks
(`use_count()==2`), the background thread or another holder still holds a
reference (`use_count()>2`), preventing removal and leaving the block in the
queue.
```
be/test/io/cache/block_file_cache_test.cpp:530: Failure
Expected: mgr.get_file_blocks_num(key) == 9
Actual: 10 == 9
```
Fix (Commit 1): Save and restore config values, set all background thread
intervals to 10000000ms during `test_file_cache` and
`test_file_cache_memory_storage`.
**Type 2: Insufficient async open timeout** (affects
`evict_privilege_order_for_ttl` and all 91 tests using `get_async_open_success`)
`initialize()` starts a background disk I/O loading thread that sets
`_async_open_done=true` only on completion. Tests wait only 100ms (100
iterations × 1ms), which is insufficient under high CPU load:
```
be/test/io/cache/block_file_cache_test.cpp:6980: Failure
(async open not completed, cache not ready for get_or_set operations)
```
Fix (Commit 2): Extract `wait_for_async_open()` helper with 1000 iterations
× 10ms (10s total timeout), replace all 91 inline wait loops. The 10ms sleep
interval also avoids exacerbating CPU pressure under load compared to the
original 1ms interval.
### Release note
None
### Check List (For Author)
- Test: Unit Test (BlockFileCacheTest)
- Behavior changed: No
- Does this need documentation: No
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]