yihua commented on code in PR #18190:
URL: https://github.com/apache/hudi/pull/18190#discussion_r2801589430


##########
hudi-common/src/main/java/org/apache/hudi/common/schema/HoodieSchema.java:
##########
@@ -80,14 +80,40 @@
  * @since 1.2.0
  */
 public class HoodieSchema implements Serializable {
+  private static final long serialVersionUID = 1L;
 
   /**
    * Constant representing a null JSON value, equivalent to 
JsonProperties.NULL_VALUE.
    * This provides compatibility with Avro's JsonProperties while maintaining 
Hudi's API.
    */
   public static final Object NULL_VALUE = JsonProperties.NULL_VALUE;
   public static final HoodieSchema NULL_SCHEMA = 
HoodieSchema.create(HoodieSchemaType.NULL);
-  private static final long serialVersionUID = 1L;
+  /**
+   * Constant to use when attaching type metadata to external schema systems 
like Spark's StructType.
+   */
+  public static final String TYPE_METADATA_FIELD = "hudi_type";
+  public static final String TYPE_METADATA_PROPS_FIELD = "hudi_type_metadata";
+
+  /**
+   * Builds a comma-separated key=value metadata string from the given map.
+   * Example: {"vector.dimension": "128"} → "vector.dimension=128"
+   */
+  public static String buildTypeMetadata(Map<String, String> props) {
+    return props.entrySet().stream()
+        .map(e -> e.getKey() + "=" + e.getValue())
+        .collect(Collectors.joining(","));
+  }
+
+  /**

Review Comment:
   `parseTypeMetadata` will throw if a metadata entry is missing the `=` 
delimiter (e.g., `"vector.dimension"` without `=128`). Could this happen if the 
user does not pass in the right entry, or is this constructed internally only?



##########
hudi-client/hudi-spark-client/src/main/scala/org/apache/spark/sql/avro/HoodieSparkSchemaConverters.scala:
##########
@@ -78,6 +79,25 @@ object HoodieSparkSchemaConverters {
         HoodieSchema.createDecimal(name, nameSpace, null, d.precision, 
d.scale, fixedSize)
 
       // Complex types
+      // Check for VECTOR type metadata property in spark struct type
+      case arrayType @ ArrayType(FloatType, containsNull) // for now checking 
floats but will need to check element type
+          if metadata.contains(HoodieSchema.TYPE_METADATA_FIELD) &&
+            
metadata.getString(HoodieSchema.TYPE_METADATA_FIELD).equalsIgnoreCase(HoodieSchemaType.VECTOR.name())
   =>
+        if (containsNull) {
+          throw new IncompatibleSchemaException(
+            s"VECTOR type does not support nullable elements (field: 
$recordName)")
+        }
+
+        val typeMetadata = 
HoodieSchema.parseTypeMetadata(metadata.getString(HoodieSchema.TYPE_METADATA_PROPS_FIELD))
+        val dimension = typeMetadata.get("vector.dimension").toInt

Review Comment:
   Define a static final variable for `vector.dimension` property so that it is 
known to exist for the vector type?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to