Re: [PR] [HUDI-7436] Fix the conditions for determining whether the records need to be rewritten [hudi]

via GitHub Tue, 12 Mar 2024 01:26:04 -0700


xiarixiaoyao commented on code in PR #10727:
URL: https://github.com/apache/hudi/pull/10727#discussion_r1521039170



##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/commit/HoodieMergeHelper.java:
##########
@@ -202,7 +202,9 @@ private Option<Function<HoodieRecord, HoodieRecord>> 
composeSchemaEvolutionTrans
       Schema newWriterSchema = 
AvroInternalSchemaConverter.convert(mergedSchema, writerSchema.getFullName());
       Schema writeSchemaFromFile = 
AvroInternalSchemaConverter.convert(writeInternalSchema, 
newWriterSchema.getFullName());
       boolean needToReWriteRecord = sameCols.size() != 
colNamesFromWriteSchema.size()
-          || 
SchemaCompatibility.checkReaderWriterCompatibility(newWriterSchema, 
writeSchemaFromFile).getType() == 
org.apache.avro.SchemaCompatibility.SchemaCompatibilityType.COMPATIBLE;
+          || 
!(SchemaCompatibility.checkReaderWriterCompatibility(newWriterSchema, 
writeSchemaFromFile).getType()

Review Comment:
   The original logic has certain performance issues 
   If the read-write schema is compatible, i think we no need rewrite the 
entire record.  since we can read  from old parquet file by new schema 
correctly. 
   
   @danny0405 @ThinkerLei 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] [HUDI-7436] Fix the conditions for determining whether the records need to be rewritten [hudi]

Reply via email to