voonhous commented on code in PR #18967:
URL: https://github.com/apache/hudi/pull/18967#discussion_r3426538757
##########
hudi-client/hudi-spark-client/src/main/scala/org/apache/hudi/SparkFileFormatInternalRecordContext.scala:
##########
@@ -47,7 +47,7 @@ trait SparkFileFormatInternalRecordContext extends
BaseSparkInternalRecordContex
* @return An [[InternalRow]].
*/
override def convertAvroRecord(avroRecord: IndexedRecord): InternalRow = {
- val schema = HoodieSchema.fromAvroSchema(avroRecord.getSchema)
+ val schema = AvroToHoodieSchemaCache.intern(avroRecord.getSchema)
Review Comment:
This was on the folded version, which I reverted (see the
`AvroToHoodieSchemaCache` thread above:
https://github.com/apache/hudi/pull/18967#discussion_r3425187989).
`HoodieSchema.fromAvroSchema` is back to the plain converter and no longer
interns, so the two are not equivalent:
- `fromAvroSchema(...)` - raw conversion, a fresh instance per call. It has
to stay non-interning: `HoodieSchemaCompatibilityChecker` memoizes by
schema-instance identity, and interning made it under-report per-field
incompatibilities (broke
`TestHoodieSchemaUtils#testIllegalPromotionsBetweenPrimitives`).
- `AvroToHoodieSchemaCache.intern(...)` - the interning entry point, for the
per-record hot paths.
So we cannot collapse onto a single interning factory without breaking the
compat checker; the split is intentional - intern on the per-record paths,
plain `fromAvroSchema` elsewhere.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]