xiarixiaoyao commented on code in PR #10727:
URL: https://github.com/apache/hudi/pull/10727#discussion_r1521039170
##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/commit/HoodieMergeHelper.java:
##########
@@ -202,7 +202,9 @@ private Option<Function<HoodieRecord, HoodieRecord>>
composeSchemaEvolutionTrans
Schema newWriterSchema =
AvroInternalSchemaConverter.convert(mergedSchema, writerSchema.getFullName());
Schema writeSchemaFromFile =
AvroInternalSchemaConverter.convert(writeInternalSchema,
newWriterSchema.getFullName());
boolean needToReWriteRecord = sameCols.size() !=
colNamesFromWriteSchema.size()
- ||
SchemaCompatibility.checkReaderWriterCompatibility(newWriterSchema,
writeSchemaFromFile).getType() ==
org.apache.avro.SchemaCompatibility.SchemaCompatibilityType.COMPATIBLE;
+ ||
!(SchemaCompatibility.checkReaderWriterCompatibility(newWriterSchema,
writeSchemaFromFile).getType()
Review Comment:
The original logic has certain performance issues
If the read-write schema is compatible, i think we no need rewrite the
entire record. since we can read from old parquet file by new schema
correctly.
@danny0405 @ThinkerLei
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]