Re: [PR] perf(common): Avoid per-record HoodieSchema rebuilds on Avro read/merge paths [hudi]

via GitHub Wed, 17 Jun 2026 01:03:25 -0700


voonhous commented on code in PR #18967:
URL: https://github.com/apache/hudi/pull/18967#discussion_r3426538757



##########
hudi-client/hudi-spark-client/src/main/scala/org/apache/hudi/SparkFileFormatInternalRecordContext.scala:
##########
@@ -47,7 +47,7 @@ trait SparkFileFormatInternalRecordContext extends 
BaseSparkInternalRecordContex
    * @return An [[InternalRow]].
    */
   override def convertAvroRecord(avroRecord: IndexedRecord): InternalRow = {
-    val schema = HoodieSchema.fromAvroSchema(avroRecord.getSchema)
+    val schema = AvroToHoodieSchemaCache.intern(avroRecord.getSchema)

Review Comment:
   This was on the folded version, which I reverted (see the 
`AvroToHoodieSchemaCache` thread above: 
https://github.com/apache/hudi/pull/18967#discussion_r3425187989). 
`HoodieSchema.fromAvroSchema` is back to the plain converter and no longer 
interns, so the two are not equivalent:
   
   - `fromAvroSchema(...)` - raw conversion, a fresh instance per call. It has 
to stay non-interning: `HoodieSchemaCompatibilityChecker` memoizes by 
schema-instance identity, and interning made it under-report per-field 
incompatibilities (broke 
`TestHoodieSchemaUtils#testIllegalPromotionsBetweenPrimitives`).
   - `AvroToHoodieSchemaCache.intern(...)` - the interning entry point, for the 
per-record hot paths.
   
   So we cannot collapse onto a single interning factory without breaking the 
compat checker; the split is intentional - intern on the per-record paths, 
plain `fromAvroSchema` elsewhere.
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] perf(common): Avoid per-record HoodieSchema rebuilds on Avro read/merge paths [hudi]

Reply via email to