danny0405 commented on code in PR #18967:
URL: https://github.com/apache/hudi/pull/18967#discussion_r3400272077


##########
hudi-common/src/main/java/org/apache/hudi/avro/AvroRecordContext.java:
##########
@@ -70,7 +71,10 @@ public AvroRecordContext() {
   public static Object getFieldValueFromIndexedRecord(
       IndexedRecord record,
       String fieldName) {
-    HoodieSchema currentSchema = 
HoodieSchema.fromAvroSchema(record.getSchema());
+    // Interning returns the canonical wrapper for this schema, whose lazily 
built field list and
+    // field map survive across calls, so the per-record cost is a cache hit 
instead of an
+    // O(schema width) wrapper rebuild.
+    HoodieSchema currentSchema = 
HoodieSchemaCache.intern(HoodieSchema.fromAvroSchema(record.getSchema()));

Review Comment:
   the HoodieSchema.fromAvroSchema( is still a lot of operations per record 
call, we should optimize it, probably a avro schema to hudi schema cache.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to