nsivabalan commented on issue #3323:
URL: https://github.com/apache/hudi/issues/3323#issuecomment-884638185


   I delved deeper and found the root cause. Could be a bug in code. But 
surprised how come we have not encountered this so far. 
   When we construct the records back from disk (log blocks), we use 
[reflection to instantiate the 
payload](https://github.com/apache/hudi/blob/5a94b6bf54b18739da55ebde10adf93f133e3204/hudi-common/src/main/java/org/apache/hudi/common/util/SpillableMapUtils.java#L116).
 And we have two constructors with OverwriteWithLatestAvroPayload, one of them 
takes in ordering field value, while 2nd one does not and assumes natural 
ordering(sets 0 as preCombine value). 
   
   Hence when two records are merged, we see the discrepancy. 
   
   Incase you are wondering, how come snapshot read is giving us correct 
results, here is the reason. 
   Snapshot reads log blocks in reverse and so the latest record always gets 
picked. 
   Where as w/ incremental, we read log blocks from start to end and so first 
record gets picked. 
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to