yihua commented on code in PR #11881:
URL: https://github.com/apache/hudi/pull/11881#discussion_r1755957174
##########
hudi-common/src/main/java/org/apache/hudi/common/table/read/HoodieFileGroupReader.java:
##########
@@ -108,7 +108,8 @@ public HoodieFileGroupReader(HoodieReaderContext<T>
readerContext,
readerContext.setRecordMerger(this.recordMerger);
readerContext.setTablePath(tablePath);
readerContext.setLatestCommitTime(latestCommitTime);
- readerContext.setShouldMergeUseRecordPosition(shouldUseRecordPosition);
+ boolean isSkipMerge = props.getString("hoodie.datasource.merge.type",
"payload_combine").equalsIgnoreCase("skip_merge");
Review Comment:
At the very least, those common static String should be defined in a common
place in `hudi-common` module.
##########
hudi-common/src/main/java/org/apache/hudi/common/table/read/HoodieFileGroupReader.java:
##########
@@ -108,7 +108,8 @@ public HoodieFileGroupReader(HoodieReaderContext<T>
readerContext,
readerContext.setRecordMerger(this.recordMerger);
readerContext.setTablePath(tablePath);
readerContext.setLatestCommitTime(latestCommitTime);
- readerContext.setShouldMergeUseRecordPosition(shouldUseRecordPosition);
+ boolean isSkipMerge = props.getString("hoodie.datasource.merge.type",
"payload_combine").equalsIgnoreCase("skip_merge");
Review Comment:
Could you move the declaration of `hoodie.datasource.merge.type`,
`payload_combine`, and `skip_merge` to `HoodieReaderConfig` and remove the
redundant declarations in Spark and Flink modules if possible:
Spark's `DataSourceOptions`
```
val REALTIME_SKIP_MERGE_OPT_VAL = "skip_merge"
val REALTIME_PAYLOAD_COMBINE_OPT_VAL = "payload_combine"
val REALTIME_MERGE: ConfigProperty[String] = ConfigProperty
.key("hoodie.datasource.merge.type")
.defaultValue(REALTIME_PAYLOAD_COMBINE_OPT_VAL)
.withValidValues(REALTIME_SKIP_MERGE_OPT_VAL,
REALTIME_PAYLOAD_COMBINE_OPT_VAL)
.markAdvanced()
.withDocumentation("For Snapshot query on merge on read table, control
whether we invoke the record " +
s"payload implementation to merge
(${REALTIME_PAYLOAD_COMBINE_OPT_VAL}) or skip merging altogether" +
s"${REALTIME_SKIP_MERGE_OPT_VAL}")
```
Flink's `FlinkOptions`:
```
public static final String REALTIME_SKIP_MERGE = "skip_merge";
public static final String REALTIME_PAYLOAD_COMBINE = "payload_combine";
@AdvancedConfig
public static final ConfigOption<String> MERGE_TYPE = ConfigOptions
.key("hoodie.datasource.merge.type")
.stringType()
.defaultValue(REALTIME_PAYLOAD_COMBINE)
.withDescription("For Snapshot query on merge on read table. Use this
key to define how the payloads are merged, in\n"
+ "1) skip_merge: read the base file records plus the log file
records;\n"
+ "2) payload_combine: read the base file records first, for each
record in base file, checks whether the key is in the\n"
+ " log file records(combines the two records with same key for
base and log file records), then read the left log file records");
```
##########
hudi-common/src/main/java/org/apache/hudi/common/table/read/HoodieFileGroupReader.java:
##########
@@ -108,7 +108,8 @@ public HoodieFileGroupReader(HoodieReaderContext<T>
readerContext,
readerContext.setRecordMerger(this.recordMerger);
readerContext.setTablePath(tablePath);
readerContext.setLatestCommitTime(latestCommitTime);
- readerContext.setShouldMergeUseRecordPosition(shouldUseRecordPosition);
+ boolean isSkipMerge = props.getString("hoodie.datasource.merge.type",
"payload_combine").equalsIgnoreCase("skip_merge");
Review Comment:
Does Hive uses the same config name?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]