xiarixiaoyao commented on code in PR #5830:
URL: https://github.com/apache/hudi/pull/5830#discussion_r1022504653
##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/commit/BaseMergeHelper.java:
##########
@@ -130,4 +145,48 @@ protected Void getResult() {
return null;
}
}
+
+ protected Iterator<GenericRecord> getRecordIterator(
+ HoodieTable<T, ?, ?, ?> table,
+ HoodieMergeHandle<T, ?, ?, ?> mergeHandle,
+ HoodieBaseFile baseFile,
+ HoodieFileReader<GenericRecord> reader,
+ Schema readSchema) throws IOException {
+ Option<InternalSchema> querySchemaOpt =
SerDeHelper.fromJson(table.getConfig().getInternalSchema());
+ if (!querySchemaOpt.isPresent()) {
+ querySchemaOpt = new
TableSchemaResolver(table.getMetaClient()).getTableInternalSchemaFromCommitMetadata();
+ }
+ boolean needToReWriteRecord = false;
+ Map<String, String> renameCols = new HashMap<>();
+ // TODO support bootstrap
+ if (querySchemaOpt.isPresent() &&
!baseFile.getBootstrapBaseFile().isPresent()) {
Review Comment:
@trushev
can we avoid moved this code snippet, i donnot think flink evolution need to
modify those codes.
https://github.com/apache/hudi/pull/6358 and
https://github.com/apache/hudi/pull/7183 will optimize this code
@danny0405
we need check evolution for each base file.
Once we have made multiple columns changes, different base files may have
different schemas, and we cannot use the schema of the current table to read
these files directly, an exception will be thrown directly
tableA: a int, b string, c double and there exist three files in this table:
f1, f2, f3
drop column from tableA and add new column d, and then we update tableA, but
we only update f2,and f3, f1 is not touched
now schema
```
schema1 from tableA: a int, b string, d long.
schema2 from f2,f3: a int, b string, d long
schema3 from f1 is: a int, b string , c double
```
we should not use schema1 to read f1.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]