Re: [PR] feat(schema): phase 17 - Remove AvroSchemaUtils usage (part 2) [hudi]

via GitHub Mon, 22 Dec 2025 04:35:11 -0800


voonhous commented on code in PR #17581:
URL: https://github.com/apache/hudi/pull/17581#discussion_r2639740019



##########
hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/utils/HiveTypeUtils.java:
##########
@@ -259,22 +260,22 @@ private static TypeInfo generateTypeInfoWorker(Schema 
schema,
     }
   }
 
-  private static TypeInfo generateRecordTypeInfo(Schema schema,
-                                                 Set<Schema> seenSchemas) 
throws AvroSerdeException {
-    assert schema.getType().equals(Schema.Type.RECORD);
+  private static TypeInfo generateRecordTypeInfo(HoodieSchema schema,
+                                                 Set<HoodieSchema> 
seenSchemas) throws AvroSerdeException {
+    ValidationUtils.checkArgument(schema.getType() == RECORD, () -> schema + " 
is not a RECORD");
 
     if (seenSchemas == null) {
-      seenSchemas = Collections.newSetFromMap(new IdentityHashMap<Schema, 
Boolean>());
+      seenSchemas = Collections.newSetFromMap(new IdentityHashMap<>());

Review Comment:
   The only reason why `IdentityHashMap` is used is that we want 
object/reference equality, where the objects we are comparing must be the same, 
i.e. stored at the same memory address, not just value equality.
   
   It looks like the code is detecting **circular/recursive** references in 
schema traversal. It needs to track whether it hsa visited this EXACT schema 
object instance during the current traversal.
   
   So, i don't think we can change it to a HashSet.
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] feat(schema): phase 17 - Remove AvroSchemaUtils usage (part 2) [hudi]

Reply via email to