Re: [PR] [HUDI-8203] Make record merge mode the primary merging config [hudi]

via GitHub Thu, 17 Oct 2024 15:37:41 -0700


yihua commented on code in PR #11943:
URL: https://github.com/apache/hudi/pull/11943#discussion_r1805528720



##########
hudi-client/hudi-spark-client/src/main/scala/org/apache/hudi/BaseSparkInternalRowReaderContext.java:
##########
@@ -49,8 +52,18 @@
 public abstract class BaseSparkInternalRowReaderContext extends 
HoodieReaderContext<InternalRow> {
 
   @Override
-  public HoodieRecordMerger getRecordMerger(String mergerStrategy) {
-    return HoodieSparkRecordMerger.getRecordMerger(mergerStrategy);
+  public Option<HoodieRecordMerger> getRecordMerger(RecordMergeMode mergeMode, 
String mergerStrategy, String mergerImpls) {
+    // TODO(HUDI-7843):
+    // get rid of event time and overwrite with latest. Just return 
Option.empty
+    switch (mergeMode) {
+      case EVENT_TIME_ORDERING:
+        return Option.of(new DefaultSparkRecordMerger());
+      case OVERWRITE_WITH_LATEST:
+        return Option.of(new OverwriteWithLatestSparkRecordMerger());
+      case CUSTOM:
+      default:
+        return Option.of(HoodieRecordUtils.createRecordMerger(null, 
EngineType.SPARK, mergerImpls, mergerStrategy));

Review Comment:
   For the first argument of `HoodieRecordUtils#createRecordMerger`, should it 
be a `boolean` of whether it is a metadata table or not?  Or if not used, could 
we get rid of it, i.e., does MDT read flow go through this merger logic?



##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metadata/HoodieMetadataWriteUtils.java:
##########
@@ -156,7 +160,12 @@ public static HoodieWriteConfig createMetadataWriteConfig(
         
.withKeyGenerator(HoodieTableMetadataKeyGenerator.class.getCanonicalName())
         .withPopulateMetaFields(DEFAULT_METADATA_POPULATE_META_FIELDS)
         .withWriteStatusClass(FailOnFirstErrorWriteStatus.class)
-        .withReleaseResourceEnabled(writeConfig.areReleaseResourceEnabled());
+        .withReleaseResourceEnabled(writeConfig.areReleaseResourceEnabled())
+        .withRecordMergeMode(RecordMergeMode.CUSTOM)
+        
.withRecordMergerStrategy(HoodieRecordMerger.PAYLOAD_BASED_MERGER_STRATEGY_UUDID)
+        .withPayloadConfig(HoodiePayloadConfig.newBuilder()

Review Comment:
   Is it because the payload type is set before (`HOODIE_METADATA`), which is 
removed now?



##########
hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/OverwriteWithLatestSparkRecordMerger.java:
##########
@@ -42,10 +42,4 @@ public String getMergingStrategy() {
   public Option<Pair<HoodieRecord, Schema>> merge(HoodieRecord older, Schema 
oldSchema, HoodieRecord newer, Schema newSchema, TypedProperties props) throws 
IOException {
     return Option.of(Pair.of(newer, newSchema));
   }
-
-  @Override
-  public HoodieRecord.HoodieRecordType getRecordType() {
-    return null;

Review Comment:
   Do you know the reason of this being null before?



##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieWriteConfig.java:
##########
@@ -168,21 +167,21 @@ public class HoodieWriteConfig extends HoodieConfig {
 
   public static final ConfigProperty<String> WRITE_PAYLOAD_CLASS_NAME = 
ConfigProperty
       .key("hoodie.datasource.write.payload.class")
-      .defaultValue(DefaultHoodieRecordPayload.class.getName())
+      .noDefaultValue()
       .markAdvanced()
+      .deprecatedAfter("1.0.0")
       .withDocumentation("Payload class used. Override this, if you like to 
roll your own merge logic, when upserting/inserting. "
           + "This will render any value set for PRECOMBINE_FIELD_OPT_VAL 
in-effective");
 
-  public static final ConfigProperty<String> WRITE_PAYLOAD_TYPE = 
ConfigProperty
-      .key("hoodie.datasource.write.payload.type")

Review Comment:
   Should we also remove `RecordPayloadType` class and usages?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] [HUDI-8203] Make record merge mode the primary merging config [hudi]

Reply via email to