Re: [PR] [HUDI-7436] Fix the conditions for determining whether the records need to be rewritten [hudi]

via GitHub Mon, 18 Mar 2024 12:39:54 -0700


yihua commented on code in PR #10727:
URL: https://github.com/apache/hudi/pull/10727#discussion_r1529152799



##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/commit/HoodieMergeHelper.java:
##########
@@ -202,7 +202,9 @@ private Option<Function<HoodieRecord, HoodieRecord>> 
composeSchemaEvolutionTrans
       Schema newWriterSchema = 
AvroInternalSchemaConverter.convert(mergedSchema, writerSchema.getFullName());
       Schema writeSchemaFromFile = 
AvroInternalSchemaConverter.convert(writeInternalSchema, 
newWriterSchema.getFullName());
       boolean needToReWriteRecord = sameCols.size() != 
colNamesFromWriteSchema.size()
-          || 
SchemaCompatibility.checkReaderWriterCompatibility(newWriterSchema, 
writeSchemaFromFile).getType() == 
org.apache.avro.SchemaCompatibility.SchemaCompatibilityType.COMPATIBLE;
+          && 
SchemaCompatibility.checkReaderWriterCompatibility(newWriterSchema, 
writeSchemaFromFile).getType()
+          == 
org.apache.avro.SchemaCompatibility.SchemaCompatibilityType.COMPATIBLE;
+

Review Comment:
   @xiarixiaoyao This info is valuable.  Basically using pruned schema to read 
Avro records is supported on Avro 1.10 and above, not on lower versions.  I see 
that Spark 3.2 and above and all Flink versions use Avro 1.10 and above.  So 
for these integrations and others that rely on Avro 1.10 and above, we should 
use pruned schema to read log records to improve performance.  I'll check the 
new file group reader.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] [HUDI-7436] Fix the conditions for determining whether the records need to be rewritten [hudi]

Reply via email to