github-actions[bot] commented on code in PR #64770:
URL: https://github.com/apache/doris/pull/64770#discussion_r3464511696


##########
be/src/io/cache/block_file_cache.cpp:
##########
@@ -757,15 +770,15 @@ std::string BlockFileCache::clear_file_cache_async() {
             FileBlockSPtr file_block = cell->file_block;
             if (file_block) {
                 std::lock_guard block_lock(file_block->_mutex);

Review Comment:
   This new sync clear path does not preserve the TTL bookkeeping cleanup that 
`clear_file_cache_directly` used to do. For TTL blocks, `remove(...)` updates 
`_cur_ttl_size` and erases `_files`, but it never removes the hash from 
`_key_to_time` or `_time_to_key`; after the clear, `add_cell` will see the 
stale `_key_to_time` entry and skip inserting a fresh `_time_to_key` record for 
the same hash. That means a newly cached TTL block can be governed by the old 
expiration, or be missed by TTL GC bookkeeping entirely. Please clear the TTL 
maps for fully removed hashes, or route TTL entries through 
`remove_if_ttl_file_blocks`/equivalent cleanup when clearing the cache.



##########
be/src/io/cache/block_file_cache.cpp:
##########
@@ -745,7 +754,11 @@ std::string BlockFileCache::clear_file_cache_async() {
             }
         }
 
-        // we cannot delete the element in the loop above, because it will 
break the iterator

Review Comment:
   The new sync clear is only walking `_files`, so it no longer clears cache 
data that exists on disk but is not currently represented in memory. The old 
`clear_file_cache_directly()` called `_storage->clear()`, and 
`FSFileCacheStorage::clear()` sweeps the cache base path and deletes every 
cache subdirectory. With this replacement, any block that has not been loaded 
into `_files` yet, or any leftover cache/tmp directory that is not tracked in 
`_files`, survives a successful `clear_file_caches(true)` response. Please keep 
a storage-level sweep for the synchronous full clear path, or otherwise delete 
untracked files before reporting that the cache was cleared.



##########
be/src/io/cache/block_file_cache.cpp:
##########
@@ -745,7 +754,11 @@ std::string BlockFileCache::clear_file_cache_async() {
             }
         }
 
-        // we cannot delete the element in the loop above, because it will 
break the iterator
+        // Do not erase while walking _files above: remove() may erase the 
current map element.
+        //
+        // sync_remove only changes how already releasable DOWNLOADED blocks 
are deleted from

Review Comment:
   Marking busy blocks as deleting is not enough for the synchronous clear 
semantics, because these cells remain in `_files` and `get_impl()`/`use_cell()` 
do not filter `is_deleting()`. A reader that starts after 
`clear_file_caches(true)` returns can still receive this old `FileBlockSPtr` 
and read the pre-clear cached data, and each new holder delays the eventual 
removal. The old direct clear removed `_files` immediately, so held blocks were 
not discoverable by later lookups. Please keep deleting blocks out of future 
`get_or_set()` results while preserving the safe holder cleanup path.



##########
be/src/http/action/file_cache_action.cpp:
##########
@@ -84,11 +84,20 @@ Status FileCacheAction::_handle_header(HttpRequest* req, 
std::string* json_metri
         const std::string& sync = req->param(std::string(SYNC));
         const std::string& segment_path = req->param(std::string(VALUE));
         if (segment_path.empty()) {
-            io::FileCacheFactory::instance()->clear_file_caches(to_lower(sync) 
== "true");
+            const bool sync_clear = to_lower(sync) == "true";
+            std::string clear_msg;
+            RETURN_IF_ERROR(
+                    
io::FileCacheFactory::instance()->clear_file_caches(sync_clear, &clear_msg));
+            if (sync_clear) {
+                EasyJson json;
+                json["status"] = "OK";
+                json["msg"] = clear_msg;
+                *json_metrics = json.ToString();
+            }
         } else {
             io::UInt128Wrapper hash = io::BlockFileCache::hash(segment_path);
             io::BlockFileCache* cache = 
io::FileCacheFactory::instance()->get_by_path(hash);
-            cache->remove_if_cached(hash);
+            cache->remove_if_cached_async(hash);

Review Comment:
   The value-scoped clear path still accepts `sync=true`, but this branch now 
ignores it and always queues `remove_if_cached_async`. Before this change, 
clearing a single `value` used `remove_if_cached`, so callers such as 
`op=clear&sync=true&value=...` could rely on releasable blocks being physically 
removed before the HTTP request returned. With the async call the in-memory 
entry disappears immediately, but the cache file can remain in storage until 
background GC runs, which is a compatibility regression for the existing 
sync/value callers. Please preserve the sync flag here, e.g. use 
`remove_if_cached(hash)` when `sync_clear` is true and the async variant 
otherwise.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to