wombatu-kun commented on code in PR #18938:
URL: https://github.com/apache/hudi/pull/18938#discussion_r3446636163
##########
hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/spark/sql/hudi/dml/schema/TestVariantDataType.scala:
##########
@@ -142,6 +150,8 @@ class TestVariantDataType extends HoodieSparkSqlTestBase {
| primaryKey = 'id',
| type = 'mor',
| preCombineField = 'ts',
+ | hoodie.parquet.variant.write.shredding.enabled = 'true',
Review Comment:
This test does not exercise the AVRO reconstruction it is meant to cover.
For a non-metadata Spark MOR table, compaction reads base files via
`SparkReaderContextFactory`
(`HoodieSparkEngineContext.getReaderContextFactory`), the native InternalRow
reader, regardless of merger; `withRecordType` only swaps the record merger and
the log block format, not the base-file reader, so `HoodieAvroParquetReader`
(where `HoodieVariantReconstruction` runs) is never reached on either record
type. `rebuildVariantRecord`, the
`AvroVariantRow`/`AvroObjectRow`/`AvroArrayRow` accessors, and both new
fail-fast branches are reachable only via `HoodieAvroReaderContext` (Java/Flink
compaction, MDT, or `LOG_COMPACT`), which this PR does not test. Consider a
Java-client compaction round-trip over a shredded base file, or a direct unit
test of `rebuildVariantRecord` / `HoodieVariantReconstruction`, and cover the
array/decimal/numeric variant shapes (current data is one string-valued object
field).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]