yihua commented on code in PR #9593:
URL: https://github.com/apache/hudi/pull/9593#discussion_r1316430431


##########
hudi-spark-datasource/hudi-spark-common/src/main/java/org/apache/hudi/HoodieSparkRecordMerger.java:
##########
@@ -38,39 +38,51 @@ public String getMergingStrategy() {
   }
 
   @Override
-  public Option<Pair<HoodieRecord, Schema>> merge(HoodieRecord older, Schema 
oldSchema, HoodieRecord newer, Schema newSchema, TypedProperties props) throws 
IOException {
-    ValidationUtils.checkArgument(older.getRecordType() == 
HoodieRecordType.SPARK);
-    ValidationUtils.checkArgument(newer.getRecordType() == 
HoodieRecordType.SPARK);
+  public Option<Pair<HoodieRecord, Schema>> merge(

Review Comment:
   As a follow-up, could you implement the logic of 
`DefaultHoodieRecordPayload` and a custom merging strategy mentioned in #9430 
in new merger strategies and make sure the new `merge` API covers all the 
functionality we support with `HoodieRecordPayload`?



##########
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/Iterators.scala:
##########
@@ -244,19 +244,13 @@ class RecordMergingFileIterator(logFiles: 
List[HoodieLogFile],
       val curRow = baseFileIterator.next()
       val curKey = curRow.getString(recordKeyOrdinal)
       val updatedRecordOpt = removeLogRecord(curKey)
-      if (updatedRecordOpt.isEmpty) {
-        // No merge is required, simply load current row and project into 
required schema
-        nextRecord = requiredSchemaProjection(curRow)
-        true
-      } else {
-       val mergedRecordOpt = merge(curRow, updatedRecordOpt.get)
-        if (mergedRecordOpt.isEmpty) {
+      val mergedRecordOpt = merge(curRow, updatedRecordOpt)

Review Comment:
   I think you still need to check `updatedRecordOpt.isEmpty` and if there is 
no update for the record key, return the record from the base file.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to