zclllyybb commented on issue #64657: URL: https://github.com/apache/doris/issues/64657#issuecomment-4764132427
Breakwater-GitHub-Analysis-Slot: slot_ca3e2afdc689 This content is generated by AI for reference only. Initial code-backed triage: The symptom is focused on the local BE file-cache layer, not on tablet distribution by itself. In 4.1.0-rc03, `cached_remote_reader_self_heal_on_not_found` is incremented only when `CachedRemoteFileReader` sees a cache block marked `DOWNLOADED`, then the local cache-file read returns `NOT_FOUND`. Doris then falls back to the remote reader for correctness and schedules `_cache->remove_if_cached_async(_cache_hash)`. This is the same class of stale file-cache metadata/local-file mismatch that public PRs #60977 and #61205 addressed. 4.1.0-rc03 already contains that self-heal logic, so a continuously rising counter means the affected BE is still repeatedly reaching stale `DOWNLOADED` entries, or cache blocks are being written and then immediately removed/not retained. `BytesWriteIntoCache` should not be treated as proof that the cache was successfully persisted. In this version, the read path increments the profile's write-into-cache bytes for the block after the remote-read write-back loop even when `append()` or `finalize()` failed and logged `Write data to file cache failed`. Therefore the reported 49-70 MB per query can still coexist with zero reusable cache if the cache path has write/rename/delete errors, inode/space pressure, or aggressive eviction. Most suspicious directions for this single BE: 1. Cache path disk or inode pressure. Defaults enter disk resource limit mode at 90% and evict-in-advance at 88%, which matches the reported high IO/utilization range. In that state Doris can repeatedly do remote read -> local cache write -> eviction/removal -> next-query miss. 2. File-cache metadata and cache files are inconsistent on that BE. The v3 file cache loads block metadata from the local RocksDB meta store, and missing local block files then surface as `DOWNLOADED + NOT_FOUND`. 3. Cache-file deletion/removal is delayed or failing. Please check the recycle queue, async remove logs, and RocksDB meta-store write/delete failures. 4. If `enable_read_cache_file_directly=true` on this BE, also check the direct-read path. That path reads cached blocks through `_cache_file_readers`; a local read failure only breaks out to the indirect path and does not perform the same self-heal in the direct-read branch. Useful evidence to attach from the affected BE and one normal BE: - BE log snippets around the spike for: `Cache block file is missing, will self-heal by clearing cache hash`, `Read data failed from file cache downloaded by others`, `Write data to file cache failed`, `open file failed with both v3 and v2 format`, `mode run in resource limit`, `need evict cache in advance`, `Failed to write to rocksdb`, and `Failed to delete to rocksdb`. - Bvar metrics for the affected cache path: `cached_remote_reader_self_heal_on_not_found`, `cached_remote_reader_s3_read`, `cached_remote_reader_peer_read`, `cached_remote_reader_failed_get_peer_addr_counter`, file-cache hit ratio/no-warmup hit ratio, cache size/capacity, queue sizes, per-reason evict bytes, `file_cache_total_evict_size`, `file_cache_disk_limit_mode`, `file_cache_need_evict_cache_in_advance`, `file_cache_recycle_keys_length`, `file_cache_meta_store_write_queue_size`, `file_cache_meta_rocksdb_write_failed_num`, and `file_cache_meta_rocksdb_delete_failed_num`. - The BE config values for `file_cache_path`, cache capacity, `file_cache_each_block_size`, `enable_read_cache_file_directly`, `enable_evict_file_cache_in_advance`, `file_cache_remove_block_qps_limit`, and `file_cache_background_gc_interval_ms`. - `df -h` and `df -i` for the cache mount on the bad BE and a normal BE, plus whether any external cleanup, pod reschedule, disk replacement, or BE restart happened before the counter spike. - The same query profile from the affected BE and a normal BE, including local/remote/peer bytes and timers, `BytesWriteIntoCache`, and write-cache time. Short-term mitigation, after preserving the above evidence: if this is isolated to one BE and cold-cache refill is acceptable, clearing the file cache on the affected BE with the existing `/api/file_cache?op=clear&sync=true` path should remove stale cache metadata/files and force a clean rebuild. If the counter immediately grows again after that, the remaining root cause is likely ongoing cache write/finalize failure, disk/inode pressure, or external deletion of cache files rather than old stale metadata. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
