danny0405 commented on code in PR #9593:
URL: https://github.com/apache/hudi/pull/9593#discussion_r1339652411


##########
hudi-common/src/main/java/org/apache/hudi/common/model/HoodieAvroRecordMerger.java:
##########
@@ -43,9 +43,29 @@ public String getMergingStrategy() {
   }
 
   @Override
-  public Option<Pair<HoodieRecord, Schema>> merge(HoodieRecord older, Schema 
oldSchema, HoodieRecord newer, Schema newSchema, TypedProperties props) throws 
IOException {
-    return combineAndGetUpdateValue(older, newer, newSchema, props)
-        .map(r -> Pair.of(new HoodieAvroIndexedRecord(r), r.getSchema()));
+  public Option<Pair<HoodieRecord, Schema>> merge(Option<HoodieRecord> older,
+                                                  Schema oldSchema,
+                                                  Option<HoodieRecord> newer,
+                                                  Schema newSchema,
+                                                  TypedProperties props) 
throws IOException {
+    boolean isValidNew = isValid(newer);
+    boolean isValidOld = isValid(older);
+    boolean isDeleteNew = isValidNew && isDelete(newer, newSchema, props);
+
+    if (!isValidOld && !isValidNew) { // No meaningful information found.
+      return Option.empty();
+    } else if (isValidOld && !isValidNew) { // Return old record for data 
safety.
+      return Option.of(Pair.of(older.get(), oldSchema));
+    } else if (!isValidOld) { // Either insert or delete case, return the new 
record.
+      return Option.of(Pair.of(newer.get(), newSchema));
+    } else {
+      if (isDeleteNew) { // delete case
+        return Option.of(Pair.of(newer.get(), newSchema));
+      }

Review Comment:
   If you look at the APIs from Kafka and Flink, they either uses a T or 
specific `RowData` there, and I guess both of the old and new values are not 
nullable, but take a look of our API:
   
   1. we make it as option so the user needs to take care whether they are 
empty and they even needs to disinduish UPSERT and DELETES, which is totally 
unnecessary.
   2. We have a HoodieRecord there, but there is many specific logic for 
hadling deletes inside Hudi which should not explose to user totally.
   3. We have many ramifications for write paths for different representations 
of delete `HoodieRecord`, like a HoodieRecord with D as operation, a 
HoodieAvroEmptyRecord, a IGNORE record, I don't think any normal user should 
see that complexity, even we make a default impls there, the user still needs 
to undertand it and have a good comprehension to use it correctly.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to