voonhous opened a new pull request, #13724:
URL: https://github.com/apache/hudi/pull/13724
### Change Logs
Adding caching support for `HFileDataBlocks` without the use of HBase
dependencies using Caffeine cache.
Test results are as such with `useBloomFilter=false`
This perf test is reproducible using the test:
```
org.apache.hudi.io.storage.TestHoodieNativeAvroHFileReaderCaching#testBlockCachePerformanceOnRecordLevelIndex
=== HFile BlockCache Performance Test ===
--- Testing 10000 Existing Key Lookups ---
10000 Existing Key Lookups:
- Without BlockCache: 30421 ms
- With BlockCache: 7628 ms
- Speedup: 3.99x
- Performance Improvement: 298.8%
--- Testing 10000 Missing Key Lookups ---
10000 Missing Key Lookups:
- Without BlockCache: 25265 ms
- With BlockCache: 5925 ms
- Speedup: 4.26x
- Performance Improvement: 326.4%
================================================================
```
### Impact
With `useBloomFilter=false`, which is the default configuration, users
### Risk level (write none, low medium or high below)
None, caching enabled reader is written by extending the original
implementation with a decorator pattern, users can revert back to previous
behaviour by setting:
```
hoodie.metadata.hfile.block.cache.enabled=false
```
### Documentation Update
3 configs are added and they are:
```
hoodie.metadata.hfile.block.cache.enabled=true (default)
hoodie.metadata.hfile.block.cache.size=100 (default)
public static final ConfigProperty<Boolean>
METADATA_HFILE_BLOCK_CACHE_ENABLED = ConfigProperty
.key(METADATA_PREFIX + ".hfile.block.cache.enabled")
.defaultValue(true)
.markAdvanced()
.sinceVersion("1.1.0")
.withDocumentation("Enable HFile block-level caching for metadata
files. This caches frequently "
+ "accessed HFile blocks in memory to reduce I/O operations during
metadata queries. "
+ "Improves performance for workloads with repeated metadata
access patterns.");
public static final ConfigProperty<Integer>
METADATA_HFILE_BLOCK_CACHE_SIZE = ConfigProperty
.key(METADATA_PREFIX + ".hfile.block.cache.size")
.defaultValue(100)
.markAdvanced()
.sinceVersion("1.1.0")
.withDocumentation("Maximum number of HFile blocks to cache in memory
per metadata file reader. "
+ "Higher values improve cache hit rates but consume more memory. "
+ "Only effective when hfile.block.cache.enabled is true.");
public static final ConfigProperty<Integer>
METADATA_HFILE_BLOCK_CACHE_TTL_MINUTES = ConfigProperty
.key(METADATA_PREFIX + ".hfile.block.cache.ttl.minutes")
.defaultValue(60)
.markAdvanced()
.sinceVersion("1.1.0")
.withDocumentation("Time-to-live (TTL) in minutes for cached HFile
blocks. Blocks are evicted "
+ "from the cache after this duration to prevent memory leaks. "
+ "Only effective when hfile.block.cache.enabled is true.");
```
### Contributor's checklist
- [ ] Read through [contributor's
guide](https://hudi.apache.org/contribute/how-to-contribute)
- [ ] Change Logs and Impact were stated clearly
- [ ] Adequate tests were added if applicable
- [ ] CI passed
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]