Re: [PR] feat: Add Unshredded Variant read & write support [hudi]

via GitHub Mon, 09 Mar 2026 15:24:27 -0700


rahil-c commented on code in PR #17833:
URL: https://github.com/apache/hudi/pull/17833#discussion_r2908245794



##########
hudi-client/hudi-spark-client/src/main/scala/org/apache/spark/sql/execution/datasources/SparkSchemaTransformUtils.scala:
##########
@@ -19,10 +19,11 @@
 
 package org.apache.spark.sql.execution.datasources
 
+import org.apache.hudi.HoodieSparkUtils
 import org.apache.spark.sql.HoodieSchemaUtils
 import 
org.apache.spark.sql.catalyst.expressions.codegen.GenerateUnsafeProjection
 import org.apache.spark.sql.catalyst.expressions.{ArrayTransform, Attribute, 
AttributeReference, Cast, CreateNamedStruct, CreateStruct, Expression, 
GetStructField, LambdaFunction, Literal, MapEntries, MapFromEntries, 
NamedLambdaVariable, UnsafeProjection}
-import org.apache.spark.sql.types.{ArrayType, DataType, DateType, DecimalType, 
DoubleType, FloatType, IntegerType, LongType, MapType, StringType, StructField, 
StructType, TimestampNTZType}
+import org.apache.spark.sql.types._

Review Comment:
   [nit] we should probably avoid defining all types here



##########
hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/io/storage/row/HoodieRowParquetWriteSupport.java:
##########
@@ -281,6 +282,18 @@ private ValueWriter makeWriter(HoodieSchema schema, 
DataType dataType) {
     } else if (dataType == DataTypes.BinaryType) {
       return (row, ordinal) -> recordConsumer.addBinary(
           Binary.fromReusedByteArray(row.getBinary(ordinal)));
+    } else if 
(SparkAdapterSupport$.MODULE$.sparkAdapter().isVariantType(dataType)) {
+      // Maps VariantType to a group containing 'metadata' and 'value' fields.
+      // This ensures Spark 4.0 compatibility and supports both Shredded and 
Unshredded schemas.
+      // Note: We intentionally omit 'typed_value' for shredded variants as 
this writer only accesses raw binary blobs.
+      BiConsumer<SpecializedGetters, Integer> variantWriter = 
SparkAdapterSupport$.MODULE$.sparkAdapter().createVariantValueWriter(
+          dataType,
+          valueBytes -> consumeField("value", 0, () -> 
recordConsumer.addBinary(Binary.fromConstantByteArray(valueBytes))),
+          metadataBytes -> consumeField("metadata", 1, () -> 
recordConsumer.addBinary(Binary.fromConstantByteArray(metadataBytes)))

Review Comment:
   +1



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] feat: Add Unshredded Variant read & write support [hudi]

Reply via email to