stayrascal commented on pull request #4724: URL: https://github.com/apache/hudi/pull/4724#issuecomment-1057603548
> yeah, @LinMingQiang has mentioned this one above. From my understanding, if we want to enable "partial update" feature by defining customized payload class, it should running "partial update" in these three cases: - merged the incoming batch records before write to disk - read records from the log file(read from MOR table) - read records from log file and compact into base file So I also update the `HoodieMergedLogRecordScanner.processNextRecord ()` by passing Schema info, if the use case is not using "partial update" with other payload class, it will use the default `preCombine()` logic to choose "recent" one. The current situation is that will we treat `preCombine` return one of two records, or we can merged them to a new record. > Just FYI for all interested folks. Precombine is not just used to dedup two records within same incoming batch, but also to deduce the winner when we [merge records in LogRecordReader.](https://github.com/apache/hudi/blob/907e60c252e80be5ef3a848d773e0f866eb609f9/hudi-common/src/main/java/org/apache/hudi/common/table/log/HoodieMergedLogRecordScanner.java#L145) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
