wombatu-kun opened a new pull request, #19021: URL: https://github.com/apache/hudi/pull/19021
### Describe the issue this Pull Request addresses The native HFile reader's `HFileDataBlock.seekTo` is the hottest inner loop on the metadata-table read path (record-level index, bloom filter and column-stats point lookups, which run on essentially every write). For each entry it scanned it allocated a `KeyValue` and its `Key` just to compare the entry key against the lookup key and to compute the stride to the next entry, producing two short-lived objects per scanned entry and avoidable GC pressure under point-lookup workloads. ### Summary and Changelog `HFileDataBlock.seekTo` now compares the entry key directly against the backing block buffer and computes the stride from the on-disk length fields, instead of materializing a `KeyValue`/`Key` for every scanned entry. A `KeyValue` is materialized only on an exact match. For the "in range" and end-of-block cases the cursor is pointed at the previous offset and the read is deferred, which `getKeyValue()` already performs lazily. The lookup key may be a `UTF8StringKey`, so its polymorphic content accessors are used for the comparison. No other class is touched and the original `Option`-based cursor is unchanged. ### Impact No public API or on-disk format change. Lower-allocation, faster point lookups on the metadata-table read path. JMH microbenchmark over an uncompressed HFile fixture (5000 entries, 625 sorted point lookups, forks(0), gc.alloc.rate.norm): point lookups drop from 677,729 to 363,721 B/op (-46%) with throughput rising from 5.25 to 6.16 ops/ms (+17%); the full-scan path is unchanged (643,705 vs 643,681 B/op), confirming the change is isolated to seekTo. ### Risk Level low. The change is confined to one method, preserves all seekTo return codes and the cursor's lazy-read semantics, and is exercised by the existing HFile reader suite (point, prefix, non-unique and fake-first-key seeks, sequential reads, empty file, and HBase read/write compatibility). The full hudi-io module test suite (101 tests) and checkstyle pass. ### Documentation Update none ### Contributor's checklist - [x] Read through [contributor's guide](https://hudi.apache.org/contribute/how-to-contribute) - [x] Enough context is provided in the sections above - [x] Adequate tests were added if applicable -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
