vinothchandar commented on code in PR #18146:
URL: https://github.com/apache/hudi/pull/18146#discussion_r2850231218


##########
hudi-common/src/main/java/org/apache/hudi/common/schema/HoodieSchema.java:
##########
@@ -1660,6 +1919,81 @@ public void validate(Schema schema) {
     }
   }
 
+  static class VectorLogicalType extends LogicalType {
+    private static final String VECTOR_LOGICAL_TYPE_NAME = "vector";
+    private static final String PROP_DIMENSION = "dimension";
+    private static final String PROP_ELEMENT_TYPE = "elementType";
+    private static final String PROP_STORAGE_BACKING = "storageBacking";
+
+    private final int dimension;
+    private final String elementType;
+    private final String storageBacking;
+
+    public VectorLogicalType(int dimension, String elementType, String 
storageBacking) {
+      super(VectorLogicalType.VECTOR_LOGICAL_TYPE_NAME);
+      ValidationUtils.checkArgument(dimension > 0,
+          () -> "Vector dimension must be positive: " + dimension);
+      ValidationUtils.checkArgument(elementType != null && 
!elementType.isEmpty(),
+          () -> "Element type cannot be null or empty");
+      ValidationUtils.checkArgument(storageBacking != null && 
!storageBacking.isEmpty(),
+          () -> "Storage backing cannot be null or empty");
+
+      this.dimension = dimension;
+      this.elementType = elementType;
+      this.storageBacking = storageBacking;
+    }
+
+    public int getDimension() {
+      return dimension;
+    }
+
+    public String getElementType() {
+      return elementType;
+    }
+
+    public String getStorageBacking() {
+      return storageBacking;
+    }
+
+    @Override
+    public Schema addToSchema(Schema schema) {
+      super.addToSchema(schema);
+      schema.addProp(PROP_DIMENSION, dimension);
+      schema.addProp(PROP_ELEMENT_TYPE, elementType);
+      schema.addProp(PROP_STORAGE_BACKING, storageBacking);
+      return schema;
+    }

Review Comment:
   I am okay for this PR. but again general code maintenance, we should have a 
standard `validate()` method for logical types. You can make the final call and 
resolve if you think this is sufficient



##########
hudi-common/src/main/java/org/apache/hudi/common/schema/HoodieSchema.java:
##########
@@ -632,6 +635,52 @@ public static HoodieSchema.Variant 
createVariantShredded(String name, String nam
     return new HoodieSchema.Variant(recordSchema);
   }
 
+  /**
+   * Creates Vector schema with default name and specified dimension.
+   *
+   * @param dimension vector dimension (must be > 0)
+   * @return new HoodieSchema.Vector
+   */
+  public static HoodieSchema.Vector createVector(int dimension) {
+    return createVector(null, dimension);
+  }
+
+  /**
+   * Creates Vector schema with custom name and dimension.
+   *
+   * @param name record name (null uses default "vector")
+   * @param dimension vector dimension (must be > 0)
+   * @return new HoodieSchema.Vector
+   */
+  public static HoodieSchema.Vector createVector(String name, int dimension) {

Review Comment:
   I see you have a overload below, that passes the type. I was flagging since 
there was no docs, and it was silently assuming float. So, with the elementType 
overload and the doc. We can resolve



##########
hudi-common/src/main/java/org/apache/hudi/common/schema/HoodieSchemaType.java:
##########
@@ -119,6 +119,8 @@ public enum HoodieSchemaType {
 
   VARIANT(Schema.Type.RECORD),
 
+  VECTOR(Schema.Type.FIXED),

Review Comment:
   Thanks for the detailed answer. So you are saying we define a new type. I 
agree, for dense vectors we are okay. My comment was around sparse vectors. or 
anything else (which it seems thereis nt)



##########
hudi-common/src/main/java/org/apache/hudi/common/schema/HoodieSchema.java:
##########
@@ -1492,6 +1541,216 @@ public int hashCode() {
     }
   }
 
+  public static class Vector extends HoodieSchema {
+    private static final String DEFAULT_NAME = "vector";
+
+    /**
+     * Enum representing vector element data types.
+     */
+    public enum VectorElementType {
+      FLOAT("FLOAT", 4),
+      DOUBLE("DOUBLE", 8),
+      INT8("INT8", 1);
+
+      private final String dataType;
+      private final int elementSize;
+
+      VectorElementType(String dataType, int elementSize) {
+        this.dataType = dataType;
+        this.elementSize = elementSize;
+      }
+
+      /**
+       * Returns the string representation for serialization.
+       *
+       * @return element type name
+       */
+      public String getDataType() {
+        return dataType;
+      }
+
+      /**
+       * Returns the byte size of a single element.
+       *
+       * @return number of bytes per element
+       */
+      public int getElementSize() {
+        return elementSize;
+      }
+
+      /**
+       * Converts a string to VectorElementType enum.
+       *
+       * @param name the element type name (e.g., "FLOAT", "DOUBLE", "INT8")
+       * @return the corresponding enum value
+       * @throws IllegalArgumentException if name is unknown
+       */
+      public static VectorElementType fromString(String name) {
+        for (VectorElementType type : values()) {
+          if (type.dataType.equalsIgnoreCase(name)) {
+            return type;
+          }
+        }
+        throw new IllegalArgumentException("Unknown element type: " + name);
+      }
+    }
+
+    // Storage backing types
+    public static final String STORAGE_BACKING_FIXED_BYTES = "FIXED_BYTES";
+
+    private final int dimension;
+    private final VectorElementType elementType;
+    private final String storageBacking;
+
+    /**
+     * Creates Vector from pre-built schema (used by factory methods).
+     *
+     * @param avroSchema the Avro schema to wrap, must be a valid Vector schema
+     * @throws IllegalArgumentException if avroSchema is null or not a valid 
Vector schema
+     */
+    private Vector(Schema avroSchema) {
+      super(avroSchema);
+
+      // Extract properties from LogicalType
+      LogicalType logicalType = avroSchema.getLogicalType();
+      if (!(logicalType instanceof VectorLogicalType)) {
+        throw new IllegalArgumentException(
+          "Schema must have VectorLogicalType, got: " + logicalType);
+      }
+
+      VectorLogicalType vectorLogicalType = (VectorLogicalType) logicalType;
+      this.dimension = vectorLogicalType.getDimension();
+      this.elementType = 
VectorElementType.fromString(vectorLogicalType.getElementType());
+      this.storageBacking = vectorLogicalType.getStorageBacking();
+
+      // Validate schema structure
+      validateVectorSchema(avroSchema);
+    }
+
+    @Override
+    public String getName() {
+      return "vector";
+    }
+
+    @Override
+    public HoodieSchemaType getType() {
+      return HoodieSchemaType.VECTOR;
+    }
+
+    /**
+     * Gets the byte size of a single element based on element type.
+     *
+     * @param elementType the element type
+     * @return number of bytes per element
+     */
+    private static int getElementSize(VectorElementType elementType) {
+      return elementType.getElementSize();
+    }
+
+    /**
+     * Creates vector schema with specified dimension and element type.
+     *
+     * @param name fixed type name (not null)
+     * @param dimension vector dimension (must be > 0)

Review Comment:
   okay. lets resolve when done



##########
hudi-common/src/main/java/org/apache/hudi/common/schema/HoodieSchema.java:
##########
@@ -632,6 +635,52 @@ public static HoodieSchema.Variant 
createVariantShredded(String name, String nam
     return new HoodieSchema.Variant(recordSchema);
   }
 
+  /**
+   * Creates Vector schema with default name and specified dimension.
+   *
+   * @param dimension vector dimension (must be > 0)
+   * @return new HoodieSchema.Vector
+   */
+  public static HoodieSchema.Vector createVector(int dimension) {
+    return createVector(null, dimension);

Review Comment:
   So, my question is about the side effects and any issues with multiple 
vector columns with same name co-existing in an avro schema.. 
   
   if this is broken, then we need to fix for other types too.  if this is a 
non-issue, lets resolve



##########
hudi-common/src/main/java/org/apache/hudi/common/schema/HoodieSchema.java:
##########
@@ -632,6 +635,52 @@ public static HoodieSchema.Variant 
createVariantShredded(String name, String nam
     return new HoodieSchema.Variant(recordSchema);
   }
 
+  /**
+   * Creates Vector schema with default name and specified dimension.
+   *
+   * @param dimension vector dimension (must be > 0)
+   * @return new HoodieSchema.Vector
+   */
+  public static HoodieSchema.Vector createVector(int dimension) {
+    return createVector(null, dimension);
+  }
+
+  /**
+   * Creates Vector schema with custom name and dimension.
+   *
+   * @param name record name (null uses default "vector")
+   * @param dimension vector dimension (must be > 0)
+   * @return new HoodieSchema.Vector
+   */
+  public static HoodieSchema.Vector createVector(String name, int dimension) {
+    return createVector(name, dimension, Vector.VectorElementType.FLOAT);
+  }
+
+  /**
+   * Creates Vector schema with custom dimension and element type.
+   *
+   * @param dimension vector dimension (must be > 0)
+   * @param elementType element type (use {@link 
Vector.VectorElementType#FLOAT} or {@link Vector.VectorElementType#DOUBLE})
+   * @return new HoodieSchema.Vector
+   */
+  public static HoodieSchema.Vector createVector(int dimension, 
Vector.VectorElementType elementType) {
+    return createVector(null, dimension, elementType);
+  }
+
+  /**
+   * Creates Vector schema with custom name, dimension, and element type.
+   *
+   * @param name record name (null uses default "vector")
+   * @param dimension vector dimension (must be > 0)
+   * @param elementType element type (use {@link 
Vector.VectorElementType#FLOAT} or {@link Vector.VectorElementType#DOUBLE})
+   * @return new HoodieSchema.Vector
+   */
+  public static HoodieSchema.Vector createVector(String name, int dimension, 
Vector.VectorElementType elementType) {
+    String vectorName = (name != null && !name.isEmpty()) ? name : 
Vector.DEFAULT_NAME;

Review Comment:
   From strictly code readability, we should try and wire in the name from a 
method that calls this - is that L681? , instead of special handling. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to