voonhous commented on code in PR #17581:
URL: https://github.com/apache/hudi/pull/17581#discussion_r2639740019
##########
hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/utils/HiveTypeUtils.java:
##########
@@ -259,22 +260,22 @@ private static TypeInfo generateTypeInfoWorker(Schema
schema,
}
}
- private static TypeInfo generateRecordTypeInfo(Schema schema,
- Set<Schema> seenSchemas)
throws AvroSerdeException {
- assert schema.getType().equals(Schema.Type.RECORD);
+ private static TypeInfo generateRecordTypeInfo(HoodieSchema schema,
+ Set<HoodieSchema>
seenSchemas) throws AvroSerdeException {
+ ValidationUtils.checkArgument(schema.getType() == RECORD, () -> schema + "
is not a RECORD");
if (seenSchemas == null) {
- seenSchemas = Collections.newSetFromMap(new IdentityHashMap<Schema,
Boolean>());
+ seenSchemas = Collections.newSetFromMap(new IdentityHashMap<>());
Review Comment:
The only reason why `IdentityHashMap` is used is that we want
object/reference equality, where the objects we are comparing must be the same,
i.e. stored at the same memory address, not just value equality.
It looks like the code is detecting **circular/recursive** references in
schema traversal. It needs to track whether it hsa visited this EXACT schema
object instance during the current traversal.
So, i don't think we can change it to a HashSet.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]