voonhous commented on code in PR #18938:
URL: https://github.com/apache/hudi/pull/18938#discussion_r3434952786
##########
hudi-spark-datasource/hudi-spark4.0.x/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/Spark40ParquetReader.scala:
##########
@@ -276,6 +277,12 @@ object Spark40ParquetReader extends
SparkParquetReaderBuilder {
)
hadoopConf.setBoolean(SQLConf.PARQUET_INFER_TIMESTAMP_NTZ_ENABLED.key,
sqlConf.parquetInferTimestampNTZEnabled)
+ hadoopConf.setBoolean(SQLConf.VARIANT_ALLOW_READING_SHREDDED.key,
Review Comment:
Stale -- this reader push no longer exists. The final #18065 removed it ("do
not mutate session SQLConf for variant allow-reading-shredded"): neither
`Spark40ParquetReader` nor `Spark41ParquetReader` sets
`SQLConf.VARIANT_ALLOW_READING_SHREDDED` on the Hadoop conf anymore. It was
dead code -- `ParquetToSparkSchemaConverter` reads that flag via `SQLConf.get`,
not from the Hadoop conf, so an explicit
`spark.sql.variant.allowReadingShredded` is honored directly with no
Hudi-default clobbering. So the precedence concern is resolved.
For context, the AVRO read path added in this PR uses a *separate* key, the
Hudi `hoodie.parquet.variant.allow.reading.shredded`, read query-scoped from
the Hadoop conf in `HoodieVariantReconstruction`; it never touches the Spark
session conf.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]