cshuo opened a new pull request, #18936: URL: https://github.com/apache/hudi/pull/18936
### Describe the issue this Pull Request addresses HFile log data blocks did not write bloom filter metadata, while ordinary HFile base files always did. This prevented metadata table partitions such as the Record Level Index from using bloom filters for point lookups. This PR adds a unified configuration switch for bloom filter writes across HFile base files and log blocks. ### Summary and Changelog - Add `hoodie.hfile.bloom.filter.enabled`, enabled by default. - Propagate HFile bloom settings through metadata write configuration and `HoodieAppendHandle`. - Write bloom filter, type, and min/max key metadata into HFile log blocks. - Allow HFile readers to fall back to point seeks when bloom metadata is absent. - Apply the configuration to ordinary HFile base-file writers. - Add unit coverage for configuration propagation, serialization, and reader behavior. - Add a Flink end-to-end test verifying bloom metadata and RLI lookups. ### Impact HFile log blocks can now contain bloom filters, improving metadata table point lookups. Bloom writes can be disabled explicitly. ### Risk Level Medium. This changes HFile metadata serialization and reader behavior across common, client, Hadoop, and Flink modules. Unit and Flink end-to-end tests cover enabled and disabled configurations, metadata contents, point-lookup fallback, and RLI queries. ### Documentation Update Document the new `hoodie.hfile.bloom.filter.enabled` configuration. It defaults to `true` to preserve existing HFile base-file behavior. ### Contributor's checklist - [ ] Read through [contributor's guide](https://hudi.apache.org/contribute/how-to-contribute) - [ ] Enough context is provided in the sections above - [ ] Adequate tests were added if applicable -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
