github-actions[bot] commented on code in PR #64770:
URL: https://github.com/apache/doris/pull/64770#discussion_r3464511696
##########
be/src/io/cache/block_file_cache.cpp:
##########
@@ -757,15 +770,15 @@ std::string BlockFileCache::clear_file_cache_async() {
FileBlockSPtr file_block = cell->file_block;
if (file_block) {
std::lock_guard block_lock(file_block->_mutex);
Review Comment:
This new sync clear path does not preserve the TTL bookkeeping cleanup that
`clear_file_cache_directly` used to do. For TTL blocks, `remove(...)` updates
`_cur_ttl_size` and erases `_files`, but it never removes the hash from
`_key_to_time` or `_time_to_key`; after the clear, `add_cell` will see the
stale `_key_to_time` entry and skip inserting a fresh `_time_to_key` record for
the same hash. That means a newly cached TTL block can be governed by the old
expiration, or be missed by TTL GC bookkeeping entirely. Please clear the TTL
maps for fully removed hashes, or route TTL entries through
`remove_if_ttl_file_blocks`/equivalent cleanup when clearing the cache.
##########
be/src/io/cache/block_file_cache.cpp:
##########
@@ -745,7 +754,11 @@ std::string BlockFileCache::clear_file_cache_async() {
}
}
- // we cannot delete the element in the loop above, because it will
break the iterator
Review Comment:
The new sync clear is only walking `_files`, so it no longer clears cache
data that exists on disk but is not currently represented in memory. The old
`clear_file_cache_directly()` called `_storage->clear()`, and
`FSFileCacheStorage::clear()` sweeps the cache base path and deletes every
cache subdirectory. With this replacement, any block that has not been loaded
into `_files` yet, or any leftover cache/tmp directory that is not tracked in
`_files`, survives a successful `clear_file_caches(true)` response. Please keep
a storage-level sweep for the synchronous full clear path, or otherwise delete
untracked files before reporting that the cache was cleared.
##########
be/src/io/cache/block_file_cache.cpp:
##########
@@ -745,7 +754,11 @@ std::string BlockFileCache::clear_file_cache_async() {
}
}
- // we cannot delete the element in the loop above, because it will
break the iterator
+ // Do not erase while walking _files above: remove() may erase the
current map element.
+ //
+ // sync_remove only changes how already releasable DOWNLOADED blocks
are deleted from
Review Comment:
Marking busy blocks as deleting is not enough for the synchronous clear
semantics, because these cells remain in `_files` and `get_impl()`/`use_cell()`
do not filter `is_deleting()`. A reader that starts after
`clear_file_caches(true)` returns can still receive this old `FileBlockSPtr`
and read the pre-clear cached data, and each new holder delays the eventual
removal. The old direct clear removed `_files` immediately, so held blocks were
not discoverable by later lookups. Please keep deleting blocks out of future
`get_or_set()` results while preserving the safe holder cleanup path.
##########
be/src/http/action/file_cache_action.cpp:
##########
@@ -84,11 +84,20 @@ Status FileCacheAction::_handle_header(HttpRequest* req,
std::string* json_metri
const std::string& sync = req->param(std::string(SYNC));
const std::string& segment_path = req->param(std::string(VALUE));
if (segment_path.empty()) {
- io::FileCacheFactory::instance()->clear_file_caches(to_lower(sync)
== "true");
+ const bool sync_clear = to_lower(sync) == "true";
+ std::string clear_msg;
+ RETURN_IF_ERROR(
+
io::FileCacheFactory::instance()->clear_file_caches(sync_clear, &clear_msg));
+ if (sync_clear) {
+ EasyJson json;
+ json["status"] = "OK";
+ json["msg"] = clear_msg;
+ *json_metrics = json.ToString();
+ }
} else {
io::UInt128Wrapper hash = io::BlockFileCache::hash(segment_path);
io::BlockFileCache* cache =
io::FileCacheFactory::instance()->get_by_path(hash);
- cache->remove_if_cached(hash);
+ cache->remove_if_cached_async(hash);
Review Comment:
The value-scoped clear path still accepts `sync=true`, but this branch now
ignores it and always queues `remove_if_cached_async`. Before this change,
clearing a single `value` used `remove_if_cached`, so callers such as
`op=clear&sync=true&value=...` could rely on releasable blocks being physically
removed before the HTTP request returned. With the async call the in-memory
entry disappears immediately, but the cache file can remain in storage until
background GC runs, which is a compatibility regression for the existing
sync/value callers. Please preserve the sync flag here, e.g. use
`remove_if_cached(hash)` when `sync_clear` is true and the async variant
otherwise.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]