voonhous commented on code in PR #17833:
URL: https://github.com/apache/hudi/pull/17833#discussion_r2740986644


##########
hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/io/storage/row/HoodieRowParquetWriteSupport.java:
##########
@@ -281,6 +282,18 @@ private ValueWriter makeWriter(HoodieSchema schema, 
DataType dataType) {
     } else if (dataType == DataTypes.BinaryType) {
       return (row, ordinal) -> recordConsumer.addBinary(
           Binary.fromReusedByteArray(row.getBinary(ordinal)));
+    } else if 
(SparkAdapterSupport$.MODULE$.sparkAdapter().isVariantType(dataType)) {
+      // Maps VariantType to a group containing 'metadata' and 'value' fields.
+      // This ensures Spark 4.0 compatibility and supports both Shredded and 
Unshredded schemas.
+      // Note: We intentionally omit 'typed_value' for shredded variants as 
this writer only accesses raw binary blobs.
+      BiConsumer<SpecializedGetters, Integer> variantWriter = 
SparkAdapterSupport$.MODULE$.sparkAdapter().createVariantValueWriter(
+          dataType,
+          valueBytes -> consumeField("value", 0, () -> 
recordConsumer.addBinary(Binary.fromReusedByteArray(valueBytes))),

Review Comment:
   I traced the code a little. I think you're right. 
   
   CMIIW or if this does not align with your mental model, it `Variant` row is 
created from `org.apache.spark.sql.catalyst.expressions.UnsafeRow#getVariant`, 
   
   ```
   @Override
     public VariantVal getVariant(int ordinal) {
       if (isNullAt(ordinal)) return null;
       return VariantVal.readFromUnsafeRow(getLong(ordinal), baseObject, 
baseOffset);
     }
   ```
   
   Looking at `org.apache.spark.unsafe.types.VariantVal#readFromUnsafeRow`, new 
bytes are allocated for both `metadata` and `value`. 
   
   So these are essentially copies.
   
   I will change `fromReusedByteArray` to `fromConstantByteArray` then.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to