Re: [PR] [HUDI-9424] Extract and test merger logic for FileGroupReader [hudi]

via GitHub Sat, 24 May 2025 02:08:11 -0700


danny0405 commented on PR #13242:
URL: https://github.com/apache/hudi/pull/13242#issuecomment-2906665356


   I tried to push some high-level changes to the code:
   
   - move the enablePartialMerging inside the `EngineRecordMerger` and add an 
enable method for it so that there is no need to pass it every time for 
`#merge` ;
   - add `merge(Option, T)` for log/log merging and `merge(T, Option)` for 
base/log merging, the assumption for null is different for these 2 scenarios;
   
   Then I try to inspect the logic changes and found several issues that made 
me quit:
   
   1. for log/log merging with `CUSTOM` merge mode, before the change, there is 
no need to do `combined record -> buffer record` conversion if the new has 
lower ordering value;
   2. for log/log merging with `CUSTOM` merge mode, before the change, the log 
records would be evolved uniformly in the 
`FileGroupRecordBuffer#getRecordsIterator` , now they evolved in the merging 
phase, which incur unnecessary performance regression;
   3. the `getNewerRecordWithEventTimeOrdering` semantics has been changed to 
be wrong, because a newer record with bigger ordering value would be ignored;
   4. In `BufferedRecord#forDeleteRecord `, the ordering value is converted to 
engine type which is wrong because we should use Java type to keep engine 
agnostic;
   
   It looks like you do not have good knowledge of these nuances and my 
suggestion is we do not change the logic and api first and just move them 
together for easy testing.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] [HUDI-9424] Extract and test merger logic for FileGroupReader [hudi]

Reply via email to