xiarixiaoyao commented on code in PR #10727:
URL: https://github.com/apache/hudi/pull/10727#discussion_r1524335926
##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/commit/HoodieMergeHelper.java:
##########
@@ -202,7 +202,9 @@ private Option<Function<HoodieRecord, HoodieRecord>>
composeSchemaEvolutionTrans
Schema newWriterSchema =
AvroInternalSchemaConverter.convert(mergedSchema, writerSchema.getFullName());
Schema writeSchemaFromFile =
AvroInternalSchemaConverter.convert(writeInternalSchema,
newWriterSchema.getFullName());
boolean needToReWriteRecord = sameCols.size() !=
colNamesFromWriteSchema.size()
- ||
SchemaCompatibility.checkReaderWriterCompatibility(newWriterSchema,
writeSchemaFromFile).getType() ==
org.apache.avro.SchemaCompatibility.SchemaCompatibilityType.COMPATIBLE;
+ &&
SchemaCompatibility.checkReaderWriterCompatibility(newWriterSchema,
writeSchemaFromFile).getType()
+ ==
org.apache.avro.SchemaCompatibility.SchemaCompatibilityType.COMPATIBLE;
+
Review Comment:
@danny0405 @ThinkerLei
i think again.
SameCols. size() == colNamesFromWriteScheme. size() only happen in following
scence
The table has new columns, while the old columns have not been
changed(rename, type change).
eg:
```
write schema: a string, b int, c long
read schema: a string, b int, c long, d int
```
In this case
SameCols. size() == colNamesFromWriteScheme. size().
and, writeSchema is equivalent to a pruned readschema.
However, some versions of AVRO, such as AVRO 1.8. x , may report errors when
using pruned schemas to read AVRO files. (avro 1.10x has no such problem)
Therefore, even if sameCols. size() == colNamesFromWriteScheme. size(), we
still need to check the compatibility of the read-write schema. If it is
compatible, we can directly use this writeSchema to read avo data.
Therefore, maybe we can use following logic to avoid unnecessary rewrite.
```
boolean needToReWriteRecord = sameCols.size() !=
colNamesFromWriteSchema.size() ||
!SchemaCompatibility.checkReaderWriterCompatibility(newWriterSchema,
writeSchemaFromFile).getType() ==
org.apache.avro.SchemaCompatibility.SchemaCompatibilityType.COMPATIBLE
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]