rahil-c commented on code in PR #18190:
URL: https://github.com/apache/hudi/pull/18190#discussion_r2886715620
##########
hudi-client/hudi-spark-client/src/main/scala/org/apache/spark/sql/avro/HoodieSparkSchemaConverters.scala:
##########
@@ -196,25 +233,29 @@ object HoodieSparkSchemaConverters {
val newRecordNames = existingRecordNames + fullName
val fields = hoodieSchema.getFields.asScala.map { f =>
val schemaType = toSqlTypeHelper(f.schema(), newRecordNames)
- val commentMetadata = if (f.doc().isPresent &&
!f.doc().get().isEmpty) {
- new MetadataBuilder().putString("comment", f.doc().get()).build()
- } else {
- Metadata.empty
- }
val fieldSchema = f.getNonNullSchema
- val metadata = if (fieldSchema.isBlobField) {
- // Mark blob fields with metadata for identification.
- // This assumes blobs are always part of a record and not the top
level schema itself
- new MetadataBuilder()
- .withMetadata(commentMetadata)
- .putString(HoodieSchema.TYPE_METADATA_FIELD,
HoodieSchemaType.BLOB.name())
- .build()
- } else {
- commentMetadata
+ val metadataBuilder = new MetadataBuilder()
+ .withMetadata(schemaType.metadata.getOrElse(Metadata.empty))
+ if (f.doc().isPresent && f.doc().get().nonEmpty) {
+ metadataBuilder.putString("comment", f.doc().get())
+ }
+ if (fieldSchema.getType == HoodieSchemaType.VECTOR) {
Review Comment:
Yea i think its look confusing, but this is under case for record `|
HoodieSchemaType.RECORD`, which will have a list of hoodie schema fields:
https://github.com/apache/hudi/pull/18190/changes#diff-3e2a24e519a0cf4b097131aa5ad08a41a9d16bebc85aa39cf168bc344e9a2e0dR234
In the iteration we examine each hoodie schema field, and have to check for
the custom hudi logical types, currently today`VECTOR` or `BLOB`. For these
types since spark has no notion of a vector or blob, we have to provide this
meta field "hudi_type" before constructing the Spark StructField, and then
store the appropriate metadata so its not treated for example as some regular
`ArrayType` of numeric values.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]