Re: [PR] [HUDI-7219] Add caching support for HFileBlocks [hudi]

via GitHub Tue, 16 Sep 2025 06:47:37 -0700


voonhous commented on code in PR #13724:
URL: https://github.com/apache/hudi/pull/13724#discussion_r2352521482



##########
hudi-common/src/main/java/org/apache/hudi/common/config/HoodieReaderConfig.java:
##########
@@ -89,4 +89,32 @@ public class HoodieReaderConfig extends HoodieConfig {
       "hoodie.write.record.merge.custom.implementation.classes";
   public static final String 
RECORD_MERGE_IMPL_CLASSES_DEPRECATED_WRITE_CONFIG_KEY =
       "hoodie.datasource.write.record.merger.impls";
+
+  public static final ConfigProperty<Boolean> HFILE_BLOCK_CACHE_ENABLED = 
ConfigProperty
+      .key("hoodie.hfile.block.cache.enabled")
+      .defaultValue(false)
+      .markAdvanced()
+      .sinceVersion("1.1.0")
+      .withDocumentation("Enable HFile block-level caching for metadata files. 
This caches frequently "
+          + "accessed HFile blocks in memory to reduce I/O operations during 
metadata queries. "
+          + "Improves performance for workloads with repeated metadata access 
patterns.");
+
+  public static final ConfigProperty<Integer> HFILE_BLOCK_CACHE_SIZE = 
ConfigProperty
+      .key("hoodie.hfile.block.cache.size")
+      .defaultValue(100)

Review Comment:
   I was trying to implement the size based cache using caffeine's 
`maximumWeight` instead of `maximumSize` tag.
   If we do size based, for example, cache size of **1000**.
   
   If we have 3 blocks:
   Block 1: size 800
   Block 2: Size 150
   Block 3: Size 300
   
   And we put Block 1, 2, 3 in order. Only block 1 and 2 will be pushed into 
the cache.
   Block 3 will always be rejected as pushing it into the cache will cause the 
size of the size of the cache to be **1250 (> 1000)**
   
   Block 3 will always be rejected until Block 1 is no longer accessed for the 
time limit (say 30 minutes).
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] [HUDI-7219] Add caching support for HFileBlocks [hudi]

Reply via email to