Re: [PR] feat(variant): support reading shredded variant base files via the AVRO reader [hudi]

via GitHub Thu, 18 Jun 2026 03:10:20 -0700


voonhous commented on code in PR #18938:
URL: https://github.com/apache/hudi/pull/18938#discussion_r3434952786



##########
hudi-spark-datasource/hudi-spark4.0.x/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/Spark40ParquetReader.scala:
##########
@@ -276,6 +277,12 @@ object Spark40ParquetReader extends 
SparkParquetReaderBuilder {
     )
     hadoopConf.setBoolean(SQLConf.PARQUET_INFER_TIMESTAMP_NTZ_ENABLED.key, 
sqlConf.parquetInferTimestampNTZEnabled)
 
+    hadoopConf.setBoolean(SQLConf.VARIANT_ALLOW_READING_SHREDDED.key,

Review Comment:
   Stale -- this reader push no longer exists. The final #18065 removed it ("do 
not mutate session SQLConf for variant allow-reading-shredded"): neither 
`Spark40ParquetReader` nor `Spark41ParquetReader` sets 
`SQLConf.VARIANT_ALLOW_READING_SHREDDED` on the Hadoop conf anymore. It was 
dead code -- `ParquetToSparkSchemaConverter` reads that flag via `SQLConf.get`, 
not from the Hadoop conf, so an explicit 
`spark.sql.variant.allowReadingShredded` is honored directly with no 
Hudi-default clobbering. So the precedence concern is resolved.
   
   For context, the AVRO read path added in this PR uses a *separate* key, the 
Hudi `hoodie.parquet.variant.allow.reading.shredded`, read query-scoped from 
the Hadoop conf in `HoodieVariantReconstruction`; it never touches the Spark 
session conf.
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] feat(variant): support reading shredded variant base files via the AVRO reader [hudi]

Reply via email to