yihua commented on code in PR #10727:
URL: https://github.com/apache/hudi/pull/10727#discussion_r1529152799
##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/commit/HoodieMergeHelper.java:
##########
@@ -202,7 +202,9 @@ private Option<Function<HoodieRecord, HoodieRecord>>
composeSchemaEvolutionTrans
Schema newWriterSchema =
AvroInternalSchemaConverter.convert(mergedSchema, writerSchema.getFullName());
Schema writeSchemaFromFile =
AvroInternalSchemaConverter.convert(writeInternalSchema,
newWriterSchema.getFullName());
boolean needToReWriteRecord = sameCols.size() !=
colNamesFromWriteSchema.size()
- ||
SchemaCompatibility.checkReaderWriterCompatibility(newWriterSchema,
writeSchemaFromFile).getType() ==
org.apache.avro.SchemaCompatibility.SchemaCompatibilityType.COMPATIBLE;
+ &&
SchemaCompatibility.checkReaderWriterCompatibility(newWriterSchema,
writeSchemaFromFile).getType()
+ ==
org.apache.avro.SchemaCompatibility.SchemaCompatibilityType.COMPATIBLE;
+
Review Comment:
@xiarixiaoyao This info is valuable. Basically using pruned schema to read
Avro records is supported on Avro 1.10 and above, not on lower versions. I see
that Spark 3.2 and above and all Flink versions use Avro 1.10 and above. So
for these integrations and others that rely on Avro 1.10 and above, we should
use pruned schema to read log records to improve performance. I'll check the
new file group reader.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]