Re: [PR] feat(schema): Add support to write shredded variants for HoodieRecordType.SPARK [hudi]

via GitHub Mon, 06 Apr 2026 07:54:43 -0700


voonhous commented on code in PR #18036:
URL: https://github.com/apache/hudi/pull/18036#discussion_r3040056041



##########
hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/io/storage/row/HoodieRowParquetWriteSupport.java:
##########
@@ -403,6 +557,33 @@ private ValueWriter makeWriter(HoodieSchema schema, 
DataType dataType) {
     }
   }
 
+  /**
+   * Creates a ValueWriter for a shredded Variant column.
+   * This writer converts a Variant value into its shredded components 
(metadata, value, typed_value) and writes them to Parquet.
+   *
+   * @param shreddedStructType The shredded StructType (with shredding 
metadata)
+   * @return A ValueWriter that handles shredded Variant writing
+   */
+  private ValueWriter makeShreddedVariantWriter(StructType shreddedStructType) 
{
+    // Create writers for the shredded struct fields
+    // The shreddedStructType contains: metadata (binary), value (binary), 
typed_value (optional)
+    ValueWriter[] shreddedFieldWriters = 
Arrays.stream(shreddedStructType.fields())

Review Comment:
   Ignore. 
   
   `SparkShreddingUtils.castShredded()` already normalizes all types before 
they reach the writer. By the time data hits `#makeWriter()`, the Spark 
DataType already carries all the information needed (micros for timestamps, 
precision/scale for decimals). 
   
   TLDR: The `HoodieSchema` would be redundant and it being `null` is fine.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] feat(schema): Add support to write shredded variants for HoodieRecordType.SPARK [hudi]

Reply via email to