yihua commented on code in PR #11943:
URL: https://github.com/apache/hudi/pull/11943#discussion_r1805664072
##########
hudi-common/src/main/java/org/apache/hudi/common/table/read/HoodieFileGroupReader.java:
##########
@@ -104,10 +98,9 @@ public HoodieFileGroupReader(HoodieReaderContext<T>
readerContext,
this.props = props;
this.start = start;
this.length = length;
- this.recordMergeMode = getRecordMergeMode(props);
HoodieTableConfig tableConfig = hoodieTableMetaClient.getTableConfig();
- this.recordMerger =
readerContext.getRecordMerger(tableConfig.getRecordMergerStrategy());
- readerContext.setRecordMerger(this.recordMerger);
+
readerContext.setRecordMerger(readerContext.getRecordMerger(tableConfig.getRecordMergeMode(),
+ tableConfig.getRecordMergerStrategy(),
props.getString("hoodie.datasource.write.record.merger.impl","")));
Review Comment:
It's good that all immutable merge configs come from table config now.
Should we persist `hoodie.datasource.write.record.merger.impl` in table config
too and make it mutable, so it also comes from the table config?
##########
hudi-common/src/main/java/org/apache/hudi/common/table/read/HoodieFileGroupReader.java:
##########
@@ -104,10 +98,9 @@ public HoodieFileGroupReader(HoodieReaderContext<T>
readerContext,
this.props = props;
this.start = start;
this.length = length;
- this.recordMergeMode = getRecordMergeMode(props);
HoodieTableConfig tableConfig = hoodieTableMetaClient.getTableConfig();
- this.recordMerger =
readerContext.getRecordMerger(tableConfig.getRecordMergerStrategy());
- readerContext.setRecordMerger(this.recordMerger);
+
readerContext.setRecordMerger(readerContext.getRecordMerger(tableConfig.getRecordMergeMode(),
+ tableConfig.getRecordMergerStrategy(),
props.getString("hoodie.datasource.write.record.merger.impl","")));
Review Comment:
Reference the config property declaration of
"hoodie.datasource.write.record.merger.impl".
##########
hudi-common/src/main/java/org/apache/hudi/common/table/read/HoodieFileGroupReaderSchemaHandler.java:
##########
@@ -162,6 +165,31 @@ private Schema generateRequiredSchema() {
return appendFieldsToSchemaDedupNested(requestedSchema, addedFields);
}
+ private static String[] getMandatoryFieldsForMerging(HoodieTableConfig cfg,
Option<HoodieRecordMerger> recordMerger) {
+ if (cfg.getRecordMergeMode() == RecordMergeMode.CUSTOM) {
+ return recordMerger.get().getMandatoryFieldsForMerging(cfg);
Review Comment:
Should the default implementation of `getMandatoryFieldsForMerging` return
all columns since custom merging may need all columns regardless of the request
schema (to be followed up in a separate fix)? Only the
`DefaultSparkRecordMerger` and `OverwriteWithLatestSparkRecordMerger`(and
counterpart classes for other engines) should overwrite the implementation to
return record key and optional preCombine field.
##########
hudi-common/src/main/java/org/apache/hudi/common/table/read/HoodieFileGroupReader.java:
##########
@@ -104,10 +98,9 @@ public HoodieFileGroupReader(HoodieReaderContext<T>
readerContext,
this.props = props;
this.start = start;
this.length = length;
- this.recordMergeMode = getRecordMergeMode(props);
HoodieTableConfig tableConfig = hoodieTableMetaClient.getTableConfig();
- this.recordMerger =
readerContext.getRecordMerger(tableConfig.getRecordMergerStrategy());
- readerContext.setRecordMerger(this.recordMerger);
+
readerContext.setRecordMerger(readerContext.getRecordMerger(tableConfig.getRecordMergeMode(),
+ tableConfig.getRecordMergerStrategy(),
props.getString("hoodie.datasource.write.record.merger.impl","")));
Review Comment:
Could we directly do
`readerContext.setRecordMerger(tableConfig.getRecordMergeMode(),
tableConfig.getRecordMergerStrategy(),
props.getString("hoodie.datasource.write.record.merger.impl",""))` by changing
the setter logic?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]