voonhous commented on code in PR #18938:
URL: https://github.com/apache/hudi/pull/18938#discussion_r3434972702


##########
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/HoodieHadoopFsRelationFactory.scala:
##########
@@ -56,12 +56,41 @@ trait HoodieHadoopFsRelationFactory {
   def buildOptions(): Map[String, String]
 }
 
+object HoodieBaseHadoopFsRelationFactory {
+  /**
+   * Resolves the variant allow-reading-shredded value using the precedence:
+   * table option > hoodie session conf > explicit Spark conf > Hudi default.
+   */
+  private[hudi] def resolveVariantAllowReadingShredded(tableOption: 
Option[String],
+                                                       hoodieSessionValue: 
Option[String],
+                                                       sparkConfValue: 
Option[String],
+                                                       hudiDefault: String): 
String =
+    
tableOption.orElse(hoodieSessionValue).orElse(sparkConfValue).getOrElse(hudiDefault)
+}
+
 abstract class HoodieBaseHadoopFsRelationFactory(val sqlContext: SQLContext,
                                                  val metaClient: 
HoodieTableMetaClient,
                                                  val options: Map[String, 
String],
                                                  val schemaSpec: 
Option[StructType],
                                                  val isBootstrap: Boolean
                                                 ) extends SparkAdapterSupport 
with HoodieHadoopFsRelationFactory with Logging {
+  // Propagate Hudi's variant allow-reading-shredded config to Spark's SQLConf.
+  // ParquetToSparkSchemaConverter reads this from SQLConf.get(), so it must 
be set
+  // before query execution starts here during table resolution
+  if (HoodieSparkUtils.gteqSpark4_0) {
+    val sqlConf = sqlContext.sparkSession.sessionState.conf
+    val hoodieConfKey = 
HoodieStorageConfig.PARQUET_VARIANT_ALLOW_READING_SHREDDED.key
+    // Literal, not SQLConf.VARIANT_ALLOW_READING_SHREDDED.key: that field is 
absent when this module compiles against Spark 3.x.
+    val sparkConfKey = "spark.sql.variant.allowReadingShredded"
+    // Precedence: table option > hoodie session key > explicit Spark conf > 
Hudi default.
+    val allowReadingShredded = 
HoodieBaseHadoopFsRelationFactory.resolveVariantAllowReadingShredded(
+      options.get(hoodieConfKey),
+      if (sqlConf.contains(hoodieConfKey)) 
Some(sqlConf.getConfString(hoodieConfKey)) else None,
+      if (sqlConf.contains(sparkConfKey)) 
Some(sqlConf.getConfString(sparkConfKey)) else None,
+      
HoodieStorageConfig.PARQUET_VARIANT_ALLOW_READING_SHREDDED.defaultValue.toString)
+    sqlConf.setConfString(sparkConfKey, allowReadingShredded)

Review Comment:
   Stale -- this is exactly what the final #18065 commit "do not mutate session 
SQLConf for variant allow-reading-shredded" addressed. The constructor no 
longer mutates the session `SQLConf`; the `resolveVariantAllowReadingShredded` 
helper and this entire block are gone from `HoodieHadoopFsRelationFactory`.
   
   How it works now, query-scoped:
   - Spark native read path: relies on Spark's own 
`SQLConf.get(VARIANT_ALLOW_READING_SHREDDED)`, so an explicit user value is 
honored and nothing leaks from Hudi.
   - AVRO read path (added in this PR): reads the Hudi key 
`hoodie.parquet.variant.allow.reading.shredded` from the Hadoop conf in 
`HoodieVariantReconstruction`, never touching session state.
   
   This file isn't part of this PR's diff either.
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to