github-actions[bot] commented on code in PR #63376:
URL: https://github.com/apache/doris/pull/63376#discussion_r3331503580
##########
be/src/exec/scan/file_scanner.h:
##########
@@ -325,13 +326,6 @@ class FileScanner : public Scanner {
:
_local_state->get_push_down_agg_type();
}
- // enable the file meta cache only when
- // 1. max_external_file_meta_cache_num is > 0
- // 2. the file number is less than 1/3 of cache's capacibility
- // Otherwise, the cache miss rate will be high
- bool _should_enable_file_meta_cache() {
- return ExecEnv::GetInstance()->file_meta_cache()->enabled() &&
- _split_source->num_scan_ranges() <
config::max_external_file_meta_cache_num / 3;
- }
Review Comment:
This changes the scanner admission policy from the existing in-memory footer
cache (`ExecEnv::file_meta_cache()->enabled()` plus the scan-range threshold)
to the new disk-cache switch only. With the default
`enable_external_file_meta_disk_cache=false`, Parquet/ORC readers now receive
`nullptr` at `file_scanner.cpp:1163/1175`, so they bypass `FileMetaCache`
entirely and repeatedly parse remote footers instead of using the pre-existing
L1 memory cache. The PR description says this adds a disk-backed layer to the
existing memory cache, so please keep the old memory-cache admission path when
the disk cache is disabled, and only use the new flag to control persistent
lookup/insert.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]