xiarixiaoyao commented on code in PR #10727:
URL: https://github.com/apache/hudi/pull/10727#discussion_r1524366492
##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/commit/HoodieMergeHelper.java:
##########
@@ -202,7 +202,9 @@ private Option<Function<HoodieRecord, HoodieRecord>>
composeSchemaEvolutionTrans
Schema newWriterSchema =
AvroInternalSchemaConverter.convert(mergedSchema, writerSchema.getFullName());
Schema writeSchemaFromFile =
AvroInternalSchemaConverter.convert(writeInternalSchema,
newWriterSchema.getFullName());
boolean needToReWriteRecord = sameCols.size() !=
colNamesFromWriteSchema.size()
- ||
SchemaCompatibility.checkReaderWriterCompatibility(newWriterSchema,
writeSchemaFromFile).getType() ==
org.apache.avro.SchemaCompatibility.SchemaCompatibilityType.COMPATIBLE;
+ &&
SchemaCompatibility.checkReaderWriterCompatibility(newWriterSchema,
writeSchemaFromFile).getType()
+ ==
org.apache.avro.SchemaCompatibility.SchemaCompatibilityType.COMPATIBLE;
+
Review Comment:
@danny0405
This place can actually raise an additional question,
Now when we are reading the MOR table, we pass the full schema when reading
the AVRO log; Even if we only query one column, if this table has 100 rows of
avro logs, using full schema to read data and generate BitCatstMap will consume
a lot of memory, and the performance will not be good.
now our current version of Avro has been upgraded to 1.10. x. In fact, we
can pass pruned schemas directly when reading logs. This way, when reading logs
and generating bitcastmaps, the speed and memory consumption are much better.
Forgive me for that i can not paste test pic due to company information
security reasons
presto read hudi log
pass full schema, we will see following log
Total size in bytes of MemoryBasedMap in ExternalSpillableMap => 712,956,000
final query time: 35672ms
pass puned schema
Total size in bytes of MemoryBasedMap in ExternalSpillableMap => 45,500,000
final query time: 13373ms
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]