Re: [PR] feat(vector): add converters from spark to hoodieSchema for vectors [hudi]

via GitHub Wed, 04 Mar 2026 15:43:36 -0800


rahil-c commented on code in PR #18190:
URL: https://github.com/apache/hudi/pull/18190#discussion_r2886715620



##########
hudi-client/hudi-spark-client/src/main/scala/org/apache/spark/sql/avro/HoodieSparkSchemaConverters.scala:
##########
@@ -196,25 +233,29 @@ object HoodieSparkSchemaConverters {
         val newRecordNames = existingRecordNames + fullName
         val fields = hoodieSchema.getFields.asScala.map { f =>
           val schemaType = toSqlTypeHelper(f.schema(), newRecordNames)
-          val commentMetadata = if (f.doc().isPresent && 
!f.doc().get().isEmpty) {
-            new MetadataBuilder().putString("comment", f.doc().get()).build()
-          } else {
-            Metadata.empty
-          }
           val fieldSchema = f.getNonNullSchema
-          val metadata = if (fieldSchema.isBlobField) {
-            // Mark blob fields with metadata for identification.
-            // This assumes blobs are always part of a record and not the top 
level schema itself
-            new MetadataBuilder()
-              .withMetadata(commentMetadata)
-              .putString(HoodieSchema.TYPE_METADATA_FIELD, 
HoodieSchemaType.BLOB.name())
-              .build()
-          } else {
-            commentMetadata
+          val metadataBuilder = new MetadataBuilder()
+            .withMetadata(schemaType.metadata.getOrElse(Metadata.empty))
+          if (f.doc().isPresent && f.doc().get().nonEmpty) {
+            metadataBuilder.putString("comment", f.doc().get())
+          }
+          if (fieldSchema.getType == HoodieSchemaType.VECTOR) {

Review Comment:
   Yea i think its look confusing, but this is under case for record `| 
HoodieSchemaType.RECORD`, which will have a list of hoodie schema fields: 
https://github.com/apache/hudi/pull/18190/changes#diff-3e2a24e519a0cf4b097131aa5ad08a41a9d16bebc85aa39cf168bc344e9a2e0dR234
   
   In the iteration we examine each hoodie schema field, and have to check for 
the custom hudi logical types, currently today`VECTOR` or `BLOB`. For these 
types since spark has no notion of a vector or blob, we have to provide this 
meta field  "hudi_type" before constructing the Spark StructField, and then 
store the appropriate metadata so its not treated for example as some regular 
`ArrayType` of numeric values.
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] feat(vector): add converters from spark to hoodieSchema for vectors [hudi]

Reply via email to