stayrascal commented on pull request #4724:
URL: https://github.com/apache/hudi/pull/4724#issuecomment-1057603548


   > 
   
   yeah, @LinMingQiang has mentioned this one above.
   
   From my understanding, if we want to enable "partial update" feature by 
defining customized payload class, it should running "partial update" in these 
three cases:
   -  merged the incoming batch records before write to disk
   -  read records from the log file(read from MOR table)
   -  read records from log file and compact into base file
   
   So I also update the `HoodieMergedLogRecordScanner.processNextRecord ()` by 
passing Schema info, if the use case is not using "partial update" with other 
payload class, it will use the default `preCombine()` logic to choose "recent" 
one.
   
   The current situation is that will we treat `preCombine` return one of two 
records, or we can merged them to a new record.
   
   > Just FYI for all interested folks. Precombine is not just used to dedup 
two records within same incoming batch, but also to deduce the winner when we 
[merge records in 
LogRecordReader.](https://github.com/apache/hudi/blob/907e60c252e80be5ef3a848d773e0f866eb609f9/hudi-common/src/main/java/org/apache/hudi/common/table/log/HoodieMergedLogRecordScanner.java#L145)
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to