xuchenhao opened a new issue, #59504:
URL: https://github.com/apache/doris/issues/59504

   ### Search before asking
   
   - [x] I had searched in the 
[issues](https://github.com/apache/doris/issues?q=is%3Aissue) and found no 
similar issues.
   
   
   ### Version
   
   master
   
   ### What's Wrong?
   
   **Location**: `be/src/io/cache/fs_file_cache_storage.cpp` in 
`FSFileCacheStorage::load_cache_info_into_memory()` (around line 880)
   
   **Description**:
   In the cache loading logic, we calculate consistency between RocksDB 
metadata and filesystem using:
   ```
   double difference_ratio =
           (static_cast<double>(estimated_file_count) - 
static_cast<double>(db_block_count)) /
           static_cast<double>(estimated_file_count);
   ```
   This formula assumes `estimated_file_count >= db_block_count`, where: 
   - `estimated_file_count = directory_size / 1MB` (upper-bound assumption)
   - `db_block_count` = actual cache blocks loaded from RocksDB
   
   However, in **data lake scenarios** with many small files (<1MB), this 
estimation becomes an **underestimation**, resulting in `estimated_file_count < 
db_block_count` and producing negative `difference_ratio` values.
   
   **Impact**:
   1. Inaccurate metric: Negative ratios don't represent the actual discrepancy 
magnitude
   2. Wrong decisions: May incorrectly skip filesystem reload when 
difference_ratio is negative but below threshold
   
   ### What You Expected?
   
   **Suggested fix**: 
   Use absolute value to measure the discrepancy magnitude:
   ```
   double difference_ratio =
           std::abs(static_cast<double>(estimated_file_count) - 
static_cast<double>(db_block_count)) /
           static_cast<double>(estimated_file_count);
   ```
   
   ### How to Reproduce?
   
   _No response_
   
   ### Anything Else?
   
   _No response_
   
   ### Are you willing to submit PR?
   
   - [ ] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [x] I agree to follow this project's [Code of 
Conduct](https://www.apache.org/foundation/policies/conduct)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to