Re: [PR] [HUDI-7341] Support unmerged log reader for HoodieFileGroupReader [hudi]

via GitHub Thu, 12 Sep 2024 14:27:10 -0700


yihua commented on code in PR #11881:
URL: https://github.com/apache/hudi/pull/11881#discussion_r1757596125



##########
hudi-common/src/main/java/org/apache/hudi/common/config/HoodieReaderConfig.java:
##########
@@ -72,4 +72,18 @@ public class HoodieReaderConfig extends HoodieConfig {
       .markAdvanced()
       .sinceVersion("1.0.0")
       .withDocumentation("Whether to use positions in the block header for 
data blocks containing updates and delete blocks for merging.");
+
+  public static final String REALTIME_SKIP_MERGE = "skip_merge";
+  public static final String REALTIME_PAYLOAD_COMBINE = "payload_combine";
+  public static final String SHOULD_ENABLE_SKIP_MERGE_BOOL = 
"hoodie.realtime.merge.skip";
+  @AdvancedConfig

Review Comment:
   This should only be used by Flink Options.
   ```suggestion
   
   ```



##########
hudi-common/src/main/java/org/apache/hudi/common/config/HoodieReaderConfig.java:
##########
@@ -72,4 +72,18 @@ public class HoodieReaderConfig extends HoodieConfig {
       .markAdvanced()
       .sinceVersion("1.0.0")
       .withDocumentation("Whether to use positions in the block header for 
data blocks containing updates and delete blocks for merging.");
+
+  public static final String REALTIME_SKIP_MERGE = "skip_merge";
+  public static final String REALTIME_PAYLOAD_COMBINE = "payload_combine";
+  public static final String SHOULD_ENABLE_SKIP_MERGE_BOOL = 
"hoodie.realtime.merge.skip";

Review Comment:
   We can keep this config key within `hudi-hadoop-mr` module.
   ```suggestion
   ```



##########
hudi-common/src/main/java/org/apache/hudi/common/config/HoodieReaderConfig.java:
##########
@@ -72,4 +72,18 @@ public class HoodieReaderConfig extends HoodieConfig {
       .markAdvanced()
       .sinceVersion("1.0.0")
       .withDocumentation("Whether to use positions in the block header for 
data blocks containing updates and delete blocks for merging.");
+
+  public static final String REALTIME_SKIP_MERGE = "skip_merge";
+  public static final String REALTIME_PAYLOAD_COMBINE = "payload_combine";
+  public static final String SHOULD_ENABLE_SKIP_MERGE_BOOL = 
"hoodie.realtime.merge.skip";
+  @AdvancedConfig
+  public static final ConfigProperty<String> MERGE_TYPE = ConfigProperty
+      .key("hoodie.datasource.merge.type")
+      .defaultValue(REALTIME_PAYLOAD_COMBINE)
+      .markAdvanced()
+      .withValidValues(REALTIME_PAYLOAD_COMBINE, REALTIME_SKIP_MERGE)
+      .withDocumentation("For Snapshot query on merge on read table. Use this 
key to define how the payloads are merged, in\n"
+          + "1) skip_merge: read the base file records plus the log file 
records;\n"
+          + "2) payload_combine: read the base file records first, for each 
record in base file, checks whether the key is in the\n"
+          + "   log file records(combines the two records with same key for 
base and log file records), then read the left log file records");

Review Comment:
   ```suggestion
             + "1) skip_merge: read the base file records plus the log file 
records without merging;\n"
             + "2) payload_combine: read the base file records first, for each 
record in base file, checks whether the key is in the\n"
             + "   log file records (combines the two records with same key for 
base and log file records), then read the left log file records");
   ```



##########
hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/utils/HoodieInputFormatUtils.java:
##########
@@ -573,7 +572,6 @@ public static String getTableBasePath(InputSplit split, 
JobConf jobConf) throws
    */
   public static boolean shouldUseFilegroupReader(final JobConf jobConf) {
     return 
jobConf.getBoolean(HoodieReaderConfig.FILE_GROUP_READER_ENABLED.key(), 
HoodieReaderConfig.FILE_GROUP_READER_ENABLED.defaultValue())
-        && 
!jobConf.getBoolean(HoodieCommonConfig.SCHEMA_EVOLUTION_ENABLE.key(), 
HoodieCommonConfig.SCHEMA_EVOLUTION_ENABLE.defaultValue())
-        && 
!jobConf.getBoolean(HoodieRealtimeRecordReader.REALTIME_SKIP_MERGE_PROP, false);

Review Comment:
   Should we add a simple unit test at the file group layer to test skip 
merging in `TestHoodieFileGroupReaderBase`?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] [HUDI-7341] Support unmerged log reader for HoodieFileGroupReader [hudi]

Reply via email to