rahil-c commented on PR #18146: URL: https://github.com/apache/hudi/pull/18146#issuecomment-3956245664
@vinothchandar regarding this comment https://github.com/apache/hudi/pull/18146#discussion_r2850214370 You are correct thanks for catch that this is a problem. I tried reproducing the issue you mentioned with the following test ``` void testMultipleVectorColumnsWithDifferentDimensions() { // Two vectors with different dimensions both get default FIXED name "vector" // but different fixedSize HoodieSchema.Vector v128 = HoodieSchema.createVector(128); HoodieSchema.Vector v256 = HoodieSchema.createVector(256); List<HoodieSchemaField> fields = Arrays.asList( HoodieSchemaField.of("id", HoodieSchema.create(HoodieSchemaType.INT)), HoodieSchemaField.of("embedding_small", v128), HoodieSchemaField.of("embedding_large", v256) ); // This should work — a table with two vector columns of different dimensions // is a valid use case (e.g., title embedding vs content embedding) HoodieSchema record = HoodieSchema.createRecord("TestRecord", null, null, fields); assertNotNull(record); // Verify both fields survive a JSON round-trip (schema serialization/parsing) String json = record.toString(); HoodieSchema parsed = HoodieSchema.parse(json); assertNotNull(parsed.getAvroSchema().getField("embedding_small")); assertNotNull(parsed.getAvroSchema().getField("embedding_large")); assertVector(HoodieSchema.fromAvroSchema(parsed.getAvroSchema().getField("embedding_small").schema()), 128, HoodieSchema.Vector.VectorElementType.FLOAT); assertVector(HoodieSchema.fromAvroSchema(parsed.getAvroSchema().getField("embedding_large").schema()), 256, HoodieSchema.Vector.VectorElementType.FLOAT); } ``` However i hit an exception it seems that avro will enforce uniqueness, when it hits the `String json = record.toString();` I think this behavior is specific to avro FIXED type i believe. ``` org.apache.avro.SchemaParseException: Can't redefine: vector at org.apache.avro.Schema$Names.put(Schema.java:1604) at org.apache.avro.Schema$NamedSchema.writeNameRef(Schema.java:846) at org.apache.avro.Schema$FixedSchema.toJson(Schema.java:1316) at org.apache.avro.Schema$RecordSchema.fieldsToJson(Schema.java:1041) at org.apache.avro.Schema$RecordSchema.toJson(Schema.java:1025) at org.apache.avro.Schema.toString(Schema.java:435) at org.apache.avro.Schema.toString(Schema.java:407) at org.apache.avro.Schema.toString(Schema.java:398) at org.apache.hudi.common.schema.HoodieSchema.toString(HoodieSchema.java:1221) at org.apache.hudi.common.schema.TestHoodieSchema.testMultipleVectorColumnsWithDifferentDimensions(TestHoodieSchema.java:1114) at java.base/java.lang.reflect.Method.invoke(Method.java:566) at java.base/java.util.ArrayList.forEach(ArrayList.java:1541) at java.base/java.util.ArrayList.forEach(ArrayList.java:1541) ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
