voonhous commented on code in PR #18967:
URL: https://github.com/apache/hudi/pull/18967#discussion_r3407670210
##########
hudi-common/src/main/java/org/apache/hudi/common/schema/HoodieSchemaCache.java:
##########
@@ -36,6 +37,14 @@ public class HoodieSchemaCache {
private static final LoadingCache<HoodieSchema, HoodieSchema> SCHEMA_CACHE =
Caffeine.newBuilder().weakValues().maximumSize(1024).build(k -> k);
+ // Avro-schema-keyed view onto the cache above for per-record call sites:
weakKeys gives
+ // identity-based lookups (records of one file share the same Schema
instance), so the hot path
+ // is a single cache hit with no wrapper allocation or type dispatch. Misses
convert and then
+ // value-intern, so equal but distinct Avro schema instances still converge
on one canonical
+ // HoodieSchema.
+ private static final LoadingCache<Schema, HoodieSchema> AVRO_SCHEMA_CACHE =
Review Comment:
Done - extracted into a dedicated `AvroToHoodieSchemaCache` class (in
`org.apache.hudi.common.schema`). It holds the Avro-`Schema`-keyed `weakKeys`
cache and on a miss converts + value-interns through `HoodieSchemaCache`, so
equal-but-distinct Avro schema instances still converge on one canonical
`HoodieSchema`. `HoodieSchemaCache` is back to interning `HoodieSchema` only,
and `AvroRecordContext` now calls the new class. (Named to avoid clashing with
the existing `org.apache.hudi.avro.AvroSchemaCache`, which is Avro -> Avro.)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]