voonhous commented on code in PR #13724:
URL: https://github.com/apache/hudi/pull/13724#discussion_r2352521482
##########
hudi-common/src/main/java/org/apache/hudi/common/config/HoodieReaderConfig.java:
##########
@@ -89,4 +89,32 @@ public class HoodieReaderConfig extends HoodieConfig {
"hoodie.write.record.merge.custom.implementation.classes";
public static final String
RECORD_MERGE_IMPL_CLASSES_DEPRECATED_WRITE_CONFIG_KEY =
"hoodie.datasource.write.record.merger.impls";
+
+ public static final ConfigProperty<Boolean> HFILE_BLOCK_CACHE_ENABLED =
ConfigProperty
+ .key("hoodie.hfile.block.cache.enabled")
+ .defaultValue(false)
+ .markAdvanced()
+ .sinceVersion("1.1.0")
+ .withDocumentation("Enable HFile block-level caching for metadata files.
This caches frequently "
+ + "accessed HFile blocks in memory to reduce I/O operations during
metadata queries. "
+ + "Improves performance for workloads with repeated metadata access
patterns.");
+
+ public static final ConfigProperty<Integer> HFILE_BLOCK_CACHE_SIZE =
ConfigProperty
+ .key("hoodie.hfile.block.cache.size")
+ .defaultValue(100)
Review Comment:
I was trying to implement the size based cache using caffeine's
`maximumWeight` instead of `maximumSize` tag.
If we do size based, for example, cache size of **1000**.
If we have 3 blocks:
Block 1: size 800
Block 2: Size 150
Block 3: Size 300
And we put Block 1, 2, 3 in order. Only block 1 and 2 will be pushed into
the cache.
Block 3 will always be rejected as pushing it into the cache will cause the
size of the size of the cache to be **1250 (> 1000)**
Block 3 will always be rejected until Block 1 is no longer accessed for the
time limit (say 30 minutes).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]