voonhous commented on code in PR #18146:
URL: https://github.com/apache/hudi/pull/18146#discussion_r2867147565


##########
hudi-common/src/test/java/org/apache/hudi/common/schema/TestHoodieSchemaType.java:
##########
@@ -224,4 +227,26 @@ private static Schema createVariantSchemaForTest() {
     HoodieSchema.VariantLogicalType.variant().addToSchema(variantRecord);
     return variantRecord;
   }
+
+  /**
+   * Creates a vector schema manually using Avro APIs.
+   *
+   * @return a vector FIXED schema with VectorLogicalType metadata
+   */
+  private static Schema createVectorSchemaForTest() {
+    int dimension = 128;
+    String elementType = HoodieSchema.Vector.VectorElementType.FLOAT.name();
+    String storageBacking = 
HoodieSchema.Vector.StorageBacking.FIXED_BYTES.name();
+
+    int fixedSize = dimension * 4;
+    // Create FIXED schema directly
+    Schema vectorSchema = Schema.createFixed("vector", null, null, fixedSize);
+
+    // Apply VectorLogicalType with metadata
+    HoodieSchema.VectorLogicalType vectorLogicalType =
+        new HoodieSchema.VectorLogicalType(dimension, elementType, 
storageBacking);
+    vectorLogicalType.addToSchema(vectorSchema);
+
+    return vectorSchema;
+  }
 }

Review Comment:
   Let's also add a few more test cases to prevent regression:
   
   # Test string property type parsing:
   
   ```
   @Test
     void testVectorFromSchemaWithStringProperties() {
       // Manually craft a JSON schema where 'dimension' is a string rather 
than an integer
       String jsonSchema = "{"
           + "\"type\":\"fixed\","
           + "\"name\":\"vector_float_128\","
           + "\"size\":512,"
           + "\"logicalType\":\"vector\","
           + "\"dimension\":\"128\","
           + "\"elementType\":\"FLOAT\","
           + "\"storageBacking\":\"FIXED_BYTES\""
           + "}";
   
       Schema avroSchema = new Schema.Parser().parse(jsonSchema);
       HoodieSchema schema = HoodieSchema.fromAvroSchema(avroSchema);
   
       assertTrue(schema instanceof HoodieSchema.Vector);
       HoodieSchema.Vector vectorSchema = (HoodieSchema.Vector) schema;
       
       // Verify it correctly parsed the string "128" into the integer 128
       assertEquals(128, vectorSchema.getDimension());
       assertEquals(HoodieSchema.Vector.VectorElementType.FLOAT, 
vectorSchema.getVectorElementType());
     }
   ```
   
   # The Size Mismatch Validation Test
   
   You added validation to ensure `fixedSize == dimension * elementSize`. We 
should explicitly test that this logic fires if someone tries to feed Hudi a 
corrupted schema.
   
   ```
   @Test
     void testVectorSizeMismatchValidation() {
       // Dimension 10, FLOAT (4 bytes) -> expected fixed size is 40.
       // We intentionally create a FIXED schema with size 42.
       Schema fixedSchema = Schema.createFixed("bad_vector", null, null, 42);
       fixedSchema.addProp("logicalType", "vector");
       fixedSchema.addProp("dimension", 10);
       fixedSchema.addProp("elementType", "FLOAT");
       fixedSchema.addProp("storageBacking", "FIXED_BYTES");
   
       IllegalArgumentException ex = 
assertThrows(IllegalArgumentException.class,
           () -> HoodieSchema.fromAvroSchema(fixedSchema));
   
       assertTrue(ex.getMessage().contains("FIXED size mismatch"),
           "Should throw size mismatch error, got: " + ex.getMessage());
     }
   ```
   
   # The Invalid Element Type Test
   Nit test case where we check what happens if a schema comes in with an 
unknown element type like `VARCHAR` or a typo like FLAOT? 
`VectorElementType.fromString()` should handle it properly, but we should lock 
that behavior in with a test.
   
   ```
   @Test
     void testVectorUnknownElementType() {
       // Create a valid FIXED schema structurally (10 * 4 = 40 bytes)
       Schema fixedSchema = Schema.createFixed("bad_vector", null, null, 40);
       fixedSchema.addProp("logicalType", "vector");
       fixedSchema.addProp("dimension", 10);
       // Intentionally inject an invalid element type
       fixedSchema.addProp("elementType", "VARCHAR"); 
   
       IllegalArgumentException ex = 
assertThrows(IllegalArgumentException.class,
           () -> HoodieSchema.fromAvroSchema(fixedSchema));
   
       assertTrue(ex.getMessage().contains("Unknown element type: VARCHAR"),
           "Should reject unknown element types");
     }
   ```



##########
hudi-common/src/main/java/org/apache/hudi/common/schema/HoodieSchema.java:
##########
@@ -1719,6 +1987,81 @@ public void validate(Schema schema) {
     }
   }
 
+  static class VectorLogicalType extends LogicalType {
+    private static final String VECTOR_LOGICAL_TYPE_NAME = "vector";
+    private static final String PROP_DIMENSION = "dimension";
+    private static final String PROP_ELEMENT_TYPE = "elementType";
+    private static final String PROP_STORAGE_BACKING = "storageBacking";
+
+    private final int dimension;
+    private final String elementType;
+    private final String storageBacking;
+
+    public VectorLogicalType(int dimension, String elementType, String 
storageBacking) {
+      super(VectorLogicalType.VECTOR_LOGICAL_TYPE_NAME);
+      ValidationUtils.checkArgument(dimension > 0,
+          () -> "Vector dimension must be positive: " + dimension);
+      ValidationUtils.checkArgument(elementType != null && 
!elementType.isEmpty(),
+          () -> "Element type cannot be null or empty");
+      ValidationUtils.checkArgument(storageBacking != null && 
!storageBacking.isEmpty(),
+          () -> "Storage backing cannot be null or empty");
+
+      this.dimension = dimension;
+      this.elementType = elementType;
+      this.storageBacking = storageBacking;
+    }
+
+    public int getDimension() {
+      return dimension;
+    }
+
+    public String getElementType() {
+      return elementType;
+    }
+
+    public String getStorageBacking() {
+      return storageBacking;
+    }
+
+    @Override
+    public Schema addToSchema(Schema schema) {
+      super.addToSchema(schema);
+      schema.addProp(PROP_DIMENSION, dimension);
+      schema.addProp(PROP_ELEMENT_TYPE, elementType);
+      schema.addProp(PROP_STORAGE_BACKING, storageBacking);
+      return schema;
+    }
+  }
+
+  /**
+   * Factory for creating VectorLogicalType instances.
+   */
+  private static class VectorLogicalTypeFactory implements 
LogicalTypes.LogicalTypeFactory {
+    @Override
+    public LogicalType fromSchema(Schema schema) {
+      // Extract properties from schema
+      Object dimObj = schema.getObjectProp(VectorLogicalType.PROP_DIMENSION);
+      int dimension = dimObj instanceof Number ? ((Number) dimObj).intValue() 
: 0;

Review Comment:
   If the schema was generated by a system that serializes properties as 
strings (e.g., `{"dimension": "128"}), dimObj instanceof Number` will be false, 
and it will default to `0`. This will cause the `VectorLogicalType` constructor 
to throw an unhelpful message error message:
   
   ```
   IllegalArgumentException: Vector dimension must be positive: 0.
   ```
   
   We can defensively guard against this:
   
   ```
   Object dimObj = schema.getObjectProp(VectorLogicalType.PROP_DIMENSION);
   int dimension = 0;
   if (dimObj != null) {
     try {
       dimension = Integer.parseInt(String.valueOf(dimObj));
     } catch (NumberFormatException e) {
       throw new IllegalArgumentException("Invalid vector dimension property: " 
+ dimObj);
     }
   }
   ValidationUtils.checkArgument(dimension > 0, 
       () -> "Missing or invalid 'dimension' property in vector schema");
   ```
   
   



##########
hudi-common/src/main/java/org/apache/hudi/common/schema/HoodieSchema.java:
##########
@@ -1551,6 +1610,215 @@ public int hashCode() {
     }
   }
 
+  public static class Vector extends HoodieSchema {
+    private static final String DEFAULT_NAME = "vector";
+
+    /**
+     * Enum representing vector element data types.
+     */
+    public enum VectorElementType {
+      FLOAT(4),
+      DOUBLE(8),
+      INT8(1);
+
+      private final int elementSize;
+
+      VectorElementType(int elementSize) {
+        this.elementSize = elementSize;
+      }
+
+      /**
+       * Returns the byte size of a single element.
+       *
+       * @return number of bytes per element
+       */
+      public int getElementSize() {
+        return elementSize;
+      }
+
+      /**
+       * Converts a string to VectorElementType enum.
+       *
+       * @param name the element type name (e.g., "FLOAT", "DOUBLE", "INT8")
+       * @return the corresponding enum value
+       * @throws IllegalArgumentException if name is unknown
+       */
+      public static VectorElementType fromString(String name) {
+        for (VectorElementType type : values()) {
+          if (type.name().equalsIgnoreCase(name)) {
+            return type;
+          }
+        }
+        throw new IllegalArgumentException("Unknown element type: " + name);
+      }
+    }
+
+    /**
+     * Enum representing the physical storage format backing a vector.
+     */
+    public enum StorageBacking {
+      FIXED_BYTES;
+
+      /**
+       * Converts a string to StorageBacking enum.
+       *
+       * @param name the storage backing name (e.g., "FIXED_BYTES")
+       * @return the corresponding enum value
+       * @throws IllegalArgumentException if name is unknown
+       */
+      public static StorageBacking fromString(String name) {
+        for (StorageBacking b : values()) {
+          if (b.name().equalsIgnoreCase(name)) {
+            return b;
+          }
+        }
+        throw new IllegalArgumentException("Unknown storage backing: " + name);
+      }
+    }
+
+    private final int dimension;
+    private final VectorElementType elementType;
+    private final StorageBacking storageBacking;
+
+    /**
+     * Creates Vector from pre-built schema (used by factory methods).
+     *
+     * @param avroSchema the Avro schema to wrap, must be a valid Vector schema
+     * @throws IllegalArgumentException if avroSchema is null or not a valid 
Vector schema
+     */
+    private Vector(Schema avroSchema) {

Review Comment:
   Nit: Make this `package-private` instead of private.
   
   This is accessed via `HoodieSchema#fromSchema`. In older java versions prior 
to Java 11, the compiler generats synthetic accessor methods when an inner 
class (including anonymous inner classes) needed to access a private member 
(field, method, or constructor) of its enclosing outer class.
   
   You can ignore this if you want since newer java versions >= 11 doesn't 
generate synthetic accessor methods anymore. Just an FYI. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to