danny0405 commented on PR #13242: URL: https://github.com/apache/hudi/pull/13242#issuecomment-2906665356
I tried to push some high-level changes to the code: - move the enablePartialMerging inside the `EngineRecordMerger` and add an enable method for it so that there is no need to pass it every time for `#merge` ; - add `merge(Option, T)` for log/log merging and `merge(T, Option)` for base/log merging, the assumption for null is different for these 2 scenarios; Then I try to inspect the logic changes and found several issues that made me quit: 1. for log/log merging with `CUSTOM` merge mode, before the change, there is no need to do `combined record -> buffer record` conversion if the new has lower ordering value; 2. for log/log merging with `CUSTOM` merge mode, before the change, the log records would be evolved uniformly in the `FileGroupRecordBuffer#getRecordsIterator` , now they evolved in the merging phase, which incur unnecessary performance regression; 3. the `getNewerRecordWithEventTimeOrdering` semantics has been changed to be wrong, because a newer record with bigger ordering value would be ignored; 4. In `BufferedRecord#forDeleteRecord `, the ordering value is converted to engine type which is wrong because we should use Java type to keep engine agnostic; It looks like you do not have good knowledge of these nuances and my suggestion is we do not change the logic and api first and just move them together for easy testing. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
