yihua commented on code in PR #11881:
URL: https://github.com/apache/hudi/pull/11881#discussion_r1757596125
##########
hudi-common/src/main/java/org/apache/hudi/common/config/HoodieReaderConfig.java:
##########
@@ -72,4 +72,18 @@ public class HoodieReaderConfig extends HoodieConfig {
.markAdvanced()
.sinceVersion("1.0.0")
.withDocumentation("Whether to use positions in the block header for
data blocks containing updates and delete blocks for merging.");
+
+ public static final String REALTIME_SKIP_MERGE = "skip_merge";
+ public static final String REALTIME_PAYLOAD_COMBINE = "payload_combine";
+ public static final String SHOULD_ENABLE_SKIP_MERGE_BOOL =
"hoodie.realtime.merge.skip";
+ @AdvancedConfig
Review Comment:
This should only be used by Flink Options.
```suggestion
```
##########
hudi-common/src/main/java/org/apache/hudi/common/config/HoodieReaderConfig.java:
##########
@@ -72,4 +72,18 @@ public class HoodieReaderConfig extends HoodieConfig {
.markAdvanced()
.sinceVersion("1.0.0")
.withDocumentation("Whether to use positions in the block header for
data blocks containing updates and delete blocks for merging.");
+
+ public static final String REALTIME_SKIP_MERGE = "skip_merge";
+ public static final String REALTIME_PAYLOAD_COMBINE = "payload_combine";
+ public static final String SHOULD_ENABLE_SKIP_MERGE_BOOL =
"hoodie.realtime.merge.skip";
Review Comment:
We can keep this config key within `hudi-hadoop-mr` module.
```suggestion
```
##########
hudi-common/src/main/java/org/apache/hudi/common/config/HoodieReaderConfig.java:
##########
@@ -72,4 +72,18 @@ public class HoodieReaderConfig extends HoodieConfig {
.markAdvanced()
.sinceVersion("1.0.0")
.withDocumentation("Whether to use positions in the block header for
data blocks containing updates and delete blocks for merging.");
+
+ public static final String REALTIME_SKIP_MERGE = "skip_merge";
+ public static final String REALTIME_PAYLOAD_COMBINE = "payload_combine";
+ public static final String SHOULD_ENABLE_SKIP_MERGE_BOOL =
"hoodie.realtime.merge.skip";
+ @AdvancedConfig
+ public static final ConfigProperty<String> MERGE_TYPE = ConfigProperty
+ .key("hoodie.datasource.merge.type")
+ .defaultValue(REALTIME_PAYLOAD_COMBINE)
+ .markAdvanced()
+ .withValidValues(REALTIME_PAYLOAD_COMBINE, REALTIME_SKIP_MERGE)
+ .withDocumentation("For Snapshot query on merge on read table. Use this
key to define how the payloads are merged, in\n"
+ + "1) skip_merge: read the base file records plus the log file
records;\n"
+ + "2) payload_combine: read the base file records first, for each
record in base file, checks whether the key is in the\n"
+ + " log file records(combines the two records with same key for
base and log file records), then read the left log file records");
Review Comment:
```suggestion
+ "1) skip_merge: read the base file records plus the log file
records without merging;\n"
+ "2) payload_combine: read the base file records first, for each
record in base file, checks whether the key is in the\n"
+ " log file records (combines the two records with same key for
base and log file records), then read the left log file records");
```
##########
hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/utils/HoodieInputFormatUtils.java:
##########
@@ -573,7 +572,6 @@ public static String getTableBasePath(InputSplit split,
JobConf jobConf) throws
*/
public static boolean shouldUseFilegroupReader(final JobConf jobConf) {
return
jobConf.getBoolean(HoodieReaderConfig.FILE_GROUP_READER_ENABLED.key(),
HoodieReaderConfig.FILE_GROUP_READER_ENABLED.defaultValue())
- &&
!jobConf.getBoolean(HoodieCommonConfig.SCHEMA_EVOLUTION_ENABLE.key(),
HoodieCommonConfig.SCHEMA_EVOLUTION_ENABLE.defaultValue())
- &&
!jobConf.getBoolean(HoodieRealtimeRecordReader.REALTIME_SKIP_MERGE_PROP, false);
Review Comment:
Should we add a simple unit test at the file group layer to test skip
merging in `TestHoodieFileGroupReaderBase`?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]