yihua commented on code in PR #18190:
URL: https://github.com/apache/hudi/pull/18190#discussion_r2801589430
##########
hudi-common/src/main/java/org/apache/hudi/common/schema/HoodieSchema.java:
##########
@@ -80,14 +80,40 @@
* @since 1.2.0
*/
public class HoodieSchema implements Serializable {
+ private static final long serialVersionUID = 1L;
/**
* Constant representing a null JSON value, equivalent to
JsonProperties.NULL_VALUE.
* This provides compatibility with Avro's JsonProperties while maintaining
Hudi's API.
*/
public static final Object NULL_VALUE = JsonProperties.NULL_VALUE;
public static final HoodieSchema NULL_SCHEMA =
HoodieSchema.create(HoodieSchemaType.NULL);
- private static final long serialVersionUID = 1L;
+ /**
+ * Constant to use when attaching type metadata to external schema systems
like Spark's StructType.
+ */
+ public static final String TYPE_METADATA_FIELD = "hudi_type";
+ public static final String TYPE_METADATA_PROPS_FIELD = "hudi_type_metadata";
+
+ /**
+ * Builds a comma-separated key=value metadata string from the given map.
+ * Example: {"vector.dimension": "128"} → "vector.dimension=128"
+ */
+ public static String buildTypeMetadata(Map<String, String> props) {
+ return props.entrySet().stream()
+ .map(e -> e.getKey() + "=" + e.getValue())
+ .collect(Collectors.joining(","));
+ }
+
+ /**
Review Comment:
`parseTypeMetadata` will throw if a metadata entry is missing the `=`
delimiter (e.g., `"vector.dimension"` without `=128`). Could this happen if the
user does not pass in the right entry, or is this constructed internally only?
##########
hudi-client/hudi-spark-client/src/main/scala/org/apache/spark/sql/avro/HoodieSparkSchemaConverters.scala:
##########
@@ -78,6 +79,25 @@ object HoodieSparkSchemaConverters {
HoodieSchema.createDecimal(name, nameSpace, null, d.precision,
d.scale, fixedSize)
// Complex types
+ // Check for VECTOR type metadata property in spark struct type
+ case arrayType @ ArrayType(FloatType, containsNull) // for now checking
floats but will need to check element type
+ if metadata.contains(HoodieSchema.TYPE_METADATA_FIELD) &&
+
metadata.getString(HoodieSchema.TYPE_METADATA_FIELD).equalsIgnoreCase(HoodieSchemaType.VECTOR.name())
=>
+ if (containsNull) {
+ throw new IncompatibleSchemaException(
+ s"VECTOR type does not support nullable elements (field:
$recordName)")
+ }
+
+ val typeMetadata =
HoodieSchema.parseTypeMetadata(metadata.getString(HoodieSchema.TYPE_METADATA_PROPS_FIELD))
+ val dimension = typeMetadata.get("vector.dimension").toInt
Review Comment:
Define a static final variable for `vector.dimension` property so that it is
known to exist for the vector type?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]