yihua opened a new pull request, #17797:
URL: https://github.com/apache/hudi/pull/17797

   ### Describe the issue this Pull Request addresses
   
   When using `COMMIT_TIME_ORDERING` merge mode with global indexes (RLI, 
Global Simple, Global Bloom) on MOR tables, the current implementation 
unnecessarily reads and merges with older record versions (by reading the 
latest file slice) during the tagging phase. This is wasteful because with 
commit time ordering, the newer commit always wins - there's no need to compare 
event timestamps.
   
   This PR optimizes the tagGlobalLocationBackToRecords method in 
HoodieIndexUtils to skip the merge phase for `COMMIT_TIME_ORDERING` mode on MOR 
tables, reducing I/O overhead.
   
   ### Summary and Changelog
   
   Performance optimization for global index with `COMMIT_TIME_ORDERING`:
   
   - Modified `HoodieIndexUtils.tagGlobalLocationBackToRecords()` to skip 
merging with older record versions when `COMMIT_TIME_ORDERING` is used on MOR 
tables
   - The optimization is safe because commit time ordering semantics guarantee 
the newer commit always overwrites the older record, regardless of event time 
field values
   - Partition path updates still work correctly as shouldUpdatePartitionPath 
is checked independently
   - Delete operations work correctly as the delete marker from a later commit 
will naturally override any older record
   
   Added comprehensive test suite `TestGlobalIndexCommitTimeOrdering.java` 
covering:
   - Basic upserts with lower/higher/equal event times
   - Partition path updates with commit time ordering
   - Delete operations
   - Deletes with unknown partition (global index lookup)
   - Mixed operations (inserts, updates, deletes)
   - Compaction behavior on MOR tables
   
   ### Impact
   
   Reduced I/O during write operations when using global indexes (RLI, Global 
Simple, Global Bloom) with `COMMIT_TIME_ORDERING` on MOR tables.
   
   ### Risk Level
   
   low
   
   ### Documentation Update
   
   none
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Enough context is provided in the sections above
   - [ ] Adequate tests were added if applicable
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to