cshuo opened a new pull request, #19026: URL: https://github.com/apache/hudi/pull/19026
### Describe the issue this Pull Request addresses Partitioned record-level index lookups in metadata should still match against the full record key within the selected data partition. The previous lookup path filtered records using prefix matching after narrowing to a single shard, which could allow a prefix-only key to match an existing record incorrectly. This PR fixes the partitioned RLI metadata read path to use full-key matching and adds a Spark functional test that verifies valid keys are returned while non-existent prefix-only keys do not match. ### Summary and Changelog - Update `HoodieBackedTableMetadata` to read partitioned record index entries with full-key filtering when resolving a single file slice. - Add `testPartitionedRecordLevelIndexLookupUsesFullKey` in `TestRecordLevelIndex.scala` to cover partitioned RLI reads with `GLOBAL_RECORD_LEVEL_INDEX_ENABLE=false` and `RECORD_LEVEL_INDEX_ENABLE=true`. - Verify both the positive path for real record keys and the negative path for a prefix-only key that should return no matches. ### Impact - **Functional impact**: Fixes incorrect prefix-based matches during partitioned record-level index lookup in metadata. - **Maintainability**: Tightens the lookup contract in the metadata reader and documents the expected behavior with a targeted regression test. - **Extensibility**: Reduces ambiguity for future changes around partitioned RLI sharding and lookup semantics by anchoring behavior in test coverage. ### Risk Level low. The production change is a one-line behavioral fix in the partitioned RLI metadata lookup path, and it is covered by a focused functional regression test in the Spark datasource test suite. ### Documentation Update none ### Contributor's checklist - [ ] Read through [contributor's guide](https://hudi.apache.org/contribute/how-to-contribute) - [ ] Enough context is provided in the sections above - [ ] Adequate tests were added if applicable -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
