alexeykudinkin commented on a change in pull request #4724:
URL: https://github.com/apache/hudi/pull/4724#discussion_r814333200
##########
File path:
hudi-common/src/main/java/org/apache/hudi/common/model/HoodieRecordPayload.java
##########
@@ -58,6 +58,31 @@ default T preCombine(T oldValue, Properties properties) {
return preCombine(oldValue);
}
+ /**
+ *When more than one HoodieRecord have the same HoodieKey in the incoming
batch, this function combines them before attempting to insert/upsert by taking
in a property map.
+ *
+ * @param oldValue instance of the old {@link HoodieRecordPayload} to be
combined with.
+ * @param properties Payload related properties. For example pass the
ordering field(s) name to extract from value in storage.
+ * @param schema Schema used for record
+ * @return the combined value
+ */
+ @PublicAPIMethod(maturity = ApiMaturityLevel.STABLE)
+ default T preCombine(T oldValue, Properties properties, Schema schema) {
Review comment:
Right, that's exactly my question: why do you want to implement such
semantic w/in `preCombine`? What use-case you're trying to accommodate for
here?
Essentially with this change you will introduce a way for 2 records w/in the
batch to be combined into 1. But why do you need this?
After all you can achieve the same goal if you just stop de-duping your
records, and then subsequently merge them against what is on disk
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]