danny0405 commented on code in PR #9593:
URL: https://github.com/apache/hudi/pull/9593#discussion_r1339652411
##########
hudi-common/src/main/java/org/apache/hudi/common/model/HoodieAvroRecordMerger.java:
##########
@@ -43,9 +43,29 @@ public String getMergingStrategy() {
}
@Override
- public Option<Pair<HoodieRecord, Schema>> merge(HoodieRecord older, Schema
oldSchema, HoodieRecord newer, Schema newSchema, TypedProperties props) throws
IOException {
- return combineAndGetUpdateValue(older, newer, newSchema, props)
- .map(r -> Pair.of(new HoodieAvroIndexedRecord(r), r.getSchema()));
+ public Option<Pair<HoodieRecord, Schema>> merge(Option<HoodieRecord> older,
+ Schema oldSchema,
+ Option<HoodieRecord> newer,
+ Schema newSchema,
+ TypedProperties props)
throws IOException {
+ boolean isValidNew = isValid(newer);
+ boolean isValidOld = isValid(older);
+ boolean isDeleteNew = isValidNew && isDelete(newer, newSchema, props);
+
+ if (!isValidOld && !isValidNew) { // No meaningful information found.
+ return Option.empty();
+ } else if (isValidOld && !isValidNew) { // Return old record for data
safety.
+ return Option.of(Pair.of(older.get(), oldSchema));
+ } else if (!isValidOld) { // Either insert or delete case, return the new
record.
+ return Option.of(Pair.of(newer.get(), newSchema));
+ } else {
+ if (isDeleteNew) { // delete case
+ return Option.of(Pair.of(newer.get(), newSchema));
+ }
Review Comment:
If you look at the APIs from Kafka and Flink, they either uses a T or
specific `RowData` there, and I guess both of the old and new values are not
nullable, but take a look of our API:
1. we make it as option so the user needs to take care whether they are
empty and they even needs to disinduish UPSERT and DELETES, which is totally
unnecessary.
2. We have a HoodieRecord there, but there is many specific logic for
hadling deletes inside Hudi which should not explose to user totally.
3. We have many ramifications for write paths for different representations
of delete `HoodieRecord`, like a HoodieRecord with D as operation, a
HoodieAvroEmptyRecord, a IGNORE record, I don't think any normal user should
see that complexity, even we make a default impls there, the user still needs
to undertand it and have a good comprehension to use it correctly.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]