alexeykudinkin commented on a change in pull request #4724:
URL: https://github.com/apache/hudi/pull/4724#discussion_r814333200



##########
File path: 
hudi-common/src/main/java/org/apache/hudi/common/model/HoodieRecordPayload.java
##########
@@ -58,6 +58,31 @@ default T preCombine(T oldValue, Properties properties) {
     return preCombine(oldValue);
   }
 
+  /**
+   *When more than one HoodieRecord have the same HoodieKey in the incoming 
batch, this function combines them before attempting to insert/upsert by taking 
in a property map.
+   *
+   * @param oldValue instance of the old {@link HoodieRecordPayload} to be 
combined with.
+   * @param properties Payload related properties. For example pass the 
ordering field(s) name to extract from value in storage.
+   * @param schema Schema used for record
+   * @return the combined value
+   */
+  @PublicAPIMethod(maturity = ApiMaturityLevel.STABLE)
+  default T preCombine(T oldValue, Properties properties, Schema schema) {

Review comment:
       Right, that's exactly my question: why do you want to implement such 
semantic w/in `preCombine`? What use-case you're trying to accommodate for 
here? 
   
   Essentially with this change you will introduce a way for 2 records w/in the 
batch to be combined into 1. But why do you need this? 
   
   After all you can achieve the same goal if you just stop de-duping your 
records, and then subsequently merge them against what is on disk




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to