nsivabalan commented on issue #3323: URL: https://github.com/apache/hudi/issues/3323#issuecomment-884638185
I delved deeper and found the root cause. Could be a bug in code. But surprised how come we have not encountered this so far. When we construct the records back from disk (log blocks), we use [reflection to instantiate the payload](https://github.com/apache/hudi/blob/5a94b6bf54b18739da55ebde10adf93f133e3204/hudi-common/src/main/java/org/apache/hudi/common/util/SpillableMapUtils.java#L116). And we have two constructors with OverwriteWithLatestAvroPayload, one of them takes in ordering field value, while 2nd one does not and assumes natural ordering(sets 0 as preCombine value). Hence when two records are merged, we see the discrepancy. Incase you are wondering, how come snapshot read is giving us correct results, here is the reason. Snapshot reads log blocks in reverse and so the latest record always gets picked. Where as w/ incremental, we read log blocks from start to end and so first record gets picked. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org