Re: [PR] [HUDI-7341] Support unmerged log reader for HoodieFileGroupReader [hudi]

via GitHub Wed, 11 Sep 2024 18:32:13 -0700


yihua commented on code in PR #11881:
URL: https://github.com/apache/hudi/pull/11881#discussion_r1755957174



##########
hudi-common/src/main/java/org/apache/hudi/common/table/read/HoodieFileGroupReader.java:
##########
@@ -108,7 +108,8 @@ public HoodieFileGroupReader(HoodieReaderContext<T> 
readerContext,
     readerContext.setRecordMerger(this.recordMerger);
     readerContext.setTablePath(tablePath);
     readerContext.setLatestCommitTime(latestCommitTime);
-    readerContext.setShouldMergeUseRecordPosition(shouldUseRecordPosition);
+    boolean isSkipMerge = props.getString("hoodie.datasource.merge.type", 
"payload_combine").equalsIgnoreCase("skip_merge");

Review Comment:
   At the very least, those common static String should be defined in a common 
place in `hudi-common` module.



##########
hudi-common/src/main/java/org/apache/hudi/common/table/read/HoodieFileGroupReader.java:
##########
@@ -108,7 +108,8 @@ public HoodieFileGroupReader(HoodieReaderContext<T> 
readerContext,
     readerContext.setRecordMerger(this.recordMerger);
     readerContext.setTablePath(tablePath);
     readerContext.setLatestCommitTime(latestCommitTime);
-    readerContext.setShouldMergeUseRecordPosition(shouldUseRecordPosition);
+    boolean isSkipMerge = props.getString("hoodie.datasource.merge.type", 
"payload_combine").equalsIgnoreCase("skip_merge");

Review Comment:
   Could you move the declaration of `hoodie.datasource.merge.type`, 
`payload_combine`, and `skip_merge` to `HoodieReaderConfig` and remove the 
redundant declarations in Spark and Flink modules if possible:
   Spark's `DataSourceOptions`
   ```
   val REALTIME_SKIP_MERGE_OPT_VAL = "skip_merge"
     val REALTIME_PAYLOAD_COMBINE_OPT_VAL = "payload_combine"
     val REALTIME_MERGE: ConfigProperty[String] = ConfigProperty
       .key("hoodie.datasource.merge.type")
       .defaultValue(REALTIME_PAYLOAD_COMBINE_OPT_VAL)
       .withValidValues(REALTIME_SKIP_MERGE_OPT_VAL, 
REALTIME_PAYLOAD_COMBINE_OPT_VAL)
       .markAdvanced()
       .withDocumentation("For Snapshot query on merge on read table, control 
whether we invoke the record " +
         s"payload implementation to merge 
(${REALTIME_PAYLOAD_COMBINE_OPT_VAL}) or skip merging altogether" +
         s"${REALTIME_SKIP_MERGE_OPT_VAL}")
   ```
   Flink's `FlinkOptions`:
   ```
   public static final String REALTIME_SKIP_MERGE = "skip_merge";
     public static final String REALTIME_PAYLOAD_COMBINE = "payload_combine";
     @AdvancedConfig
     public static final ConfigOption<String> MERGE_TYPE = ConfigOptions
         .key("hoodie.datasource.merge.type")
         .stringType()
         .defaultValue(REALTIME_PAYLOAD_COMBINE)
         .withDescription("For Snapshot query on merge on read table. Use this 
key to define how the payloads are merged, in\n"
             + "1) skip_merge: read the base file records plus the log file 
records;\n"
             + "2) payload_combine: read the base file records first, for each 
record in base file, checks whether the key is in the\n"
             + "   log file records(combines the two records with same key for 
base and log file records), then read the left log file records");
   ```



##########
hudi-common/src/main/java/org/apache/hudi/common/table/read/HoodieFileGroupReader.java:
##########
@@ -108,7 +108,8 @@ public HoodieFileGroupReader(HoodieReaderContext<T> 
readerContext,
     readerContext.setRecordMerger(this.recordMerger);
     readerContext.setTablePath(tablePath);
     readerContext.setLatestCommitTime(latestCommitTime);
-    readerContext.setShouldMergeUseRecordPosition(shouldUseRecordPosition);
+    boolean isSkipMerge = props.getString("hoodie.datasource.merge.type", 
"payload_combine").equalsIgnoreCase("skip_merge");

Review Comment:
   Does Hive uses the same config name?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] [HUDI-7341] Support unmerged log reader for HoodieFileGroupReader [hudi]

Reply via email to