Re: [PR] feat: Add Unshredded Variant read & write support [hudi]

via GitHub Thu, 22 Jan 2026 07:33:13 -0800


voonhous commented on code in PR #17833:
URL: https://github.com/apache/hudi/pull/17833#discussion_r2717419124



##########
hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/io/storage/row/HoodieRowParquetWriteSupport.java:
##########
@@ -280,6 +281,23 @@ private ValueWriter makeWriter(HoodieSchema schema, 
DataType dataType) {
     } else if (dataType == DataTypes.BinaryType) {
       return (row, ordinal) -> recordConsumer.addBinary(
           Binary.fromReusedByteArray(row.getBinary(ordinal)));
+    } else if 
(SparkAdapterSupport$.MODULE$.sparkAdapter().isVariantType(dataType)) {

Review Comment:
   Don't think so, i have test that uses `HoodieRecordType.{AVRO, SPARK}`. They 
should trigger both write support and it seems there are no test failures.
   
   In Avro, Variant is already an Avro record from 
`HoodieSchema.createVariant`. Where  `Fields: value (bytes), metadata (bytes)`. 
   
   IIUC, Parquet's AvroWriteSupport handles this automatically as it will know 
how to convert:
   - Avro record -> Parquet group
   -  Avro bytes -> Parquet binary
   
   `HoodieAvroWriteSupport` just wraps `AvroWriteSupport` to add bloom filter 
support and does not override write logic.
   
   In the Spark Row path, custom handling is needed because Spark's 
`VariantType` requires special APIs (`createVariantValueWriter`) to extract the 
raw bytes as there are no automatic Spark VariantType -> Parquet conversion 
from what i can see in our code.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] feat: Add Unshredded Variant read & write support [hudi]

Reply via email to