Re: [PR] feat(variant): support reading shredded variant base files via the AVRO reader [hudi]

via GitHub Tue, 23 Jun 2026 03:12:42 -0700


voonhous commented on code in PR #18938:
URL: https://github.com/apache/hudi/pull/18938#discussion_r3458800167



##########
hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/spark/sql/hudi/dml/schema/TestVariantDataType.scala:
##########
@@ -142,6 +150,8 @@ class TestVariantDataType extends HoodieSparkSqlTestBase {
            |  primaryKey = 'id',
            |  type = 'mor',
            |  preCombineField = 'ts',
+           |  hoodie.parquet.variant.write.shredding.enabled = 'true',

Review Comment:
   `withRecordType` only swaps the record merger and log-block format, and 
Spark compaction reads base files via `SparkReaderContextFactory` -> 
`SparkFileFormatInternalRowReaderContext` (InternalRow), so 
`HoodieAvroParquetReader` (where 
`HoodieVariantReconstruction`/`rebuildVariantRecord` run) is never reached on 
either record type.
   
   Fixed in cc41b4f: added `TestSpark4VariantShreddingProvider`, a direct shred 
-> rebuild round-trip over the provider covering scalar 
(numeric/string/boolean/decimal), object, and array shapes - this exercises 
`rebuildVariantRecord` and the `AvroVariantRow`/`AvroObjectRow`/`AvroArrayRow` 
accessors. The two `create()` fail-fast branches are covered by 
`TestHoodieVariantReconstruction` (added for your other comment). Also 
corrected the misleading comment on the SQL test to say it covers the Spark 
InternalRow (native) read of a shredded base file, not the Avro path.
   
   Note on the Java-client suggestion: `Spark4VariantShreddingProvider` is the 
only `VariantShreddingProvider`, so a Java/Flink compaction over a shredded 
base file would hit the no-provider fail-fast branch rather than reconstruct - 
which is why I went with the direct unit test for the reconstruction coverage.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] feat(variant): support reading shredded variant base files via the AVRO reader [hudi]

Reply via email to