yihua opened a new pull request, #17797: URL: https://github.com/apache/hudi/pull/17797
### Describe the issue this Pull Request addresses When using `COMMIT_TIME_ORDERING` merge mode with global indexes (RLI, Global Simple, Global Bloom) on MOR tables, the current implementation unnecessarily reads and merges with older record versions (by reading the latest file slice) during the tagging phase. This is wasteful because with commit time ordering, the newer commit always wins - there's no need to compare event timestamps. This PR optimizes the tagGlobalLocationBackToRecords method in HoodieIndexUtils to skip the merge phase for `COMMIT_TIME_ORDERING` mode on MOR tables, reducing I/O overhead. ### Summary and Changelog Performance optimization for global index with `COMMIT_TIME_ORDERING`: - Modified `HoodieIndexUtils.tagGlobalLocationBackToRecords()` to skip merging with older record versions when `COMMIT_TIME_ORDERING` is used on MOR tables - The optimization is safe because commit time ordering semantics guarantee the newer commit always overwrites the older record, regardless of event time field values - Partition path updates still work correctly as shouldUpdatePartitionPath is checked independently - Delete operations work correctly as the delete marker from a later commit will naturally override any older record Added comprehensive test suite `TestGlobalIndexCommitTimeOrdering.java` covering: - Basic upserts with lower/higher/equal event times - Partition path updates with commit time ordering - Delete operations - Deletes with unknown partition (global index lookup) - Mixed operations (inserts, updates, deletes) - Compaction behavior on MOR tables ### Impact Reduced I/O during write operations when using global indexes (RLI, Global Simple, Global Bloom) with `COMMIT_TIME_ORDERING` on MOR tables. ### Risk Level low ### Documentation Update none ### Contributor's checklist - [ ] Read through [contributor's guide](https://hudi.apache.org/contribute/how-to-contribute) - [ ] Enough context is provided in the sections above - [ ] Adequate tests were added if applicable -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
