TheR1sing3un commented on code in PR #14161:
URL: https://github.com/apache/hudi/pull/14161#discussion_r2587264427


##########
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/HoodieFileGroupReaderBasedFileFormat.scala:
##########
@@ -202,6 +214,8 @@ class HoodieFileGroupReaderBasedFileFormat(tablePath: 
String,
     val requestedAvroSchema = AvroSchemaUtils.pruneDataSchema(avroTableSchema, 
AvroConversionUtils.convertStructTypeToAvroSchema(requestedSchema, 
sanitizedTableName), exclusionFields)
     val dataAvroSchema = AvroSchemaUtils.pruneDataSchema(avroTableSchema, 
AvroConversionUtils.convertStructTypeToAvroSchema(dataSchema, 
sanitizedTableName), exclusionFields)
 
+    
spark.sessionState.conf.setConfString("spark.sql.parquet.enableVectorizedReader",
 supportVectorizedRead.toString)

Review Comment:
   > A friendly reminder: If we modify this configuration in the conf of spark 
sessionState in the hudi logic, it may disrupt the read logic of other 
datasources. For example, if this configuration is initially set to true, When 
a spark sql reads a hudi table and another datasource table such as a hive 
table, the behavior we hope for is that whether the hudi performs vectorized 
reading is controlled by the hudi logic itself, while hive directly performs 
vectorized reading. However, if we change this configuration here, perhaps this 
will lead to hive not performing vectorized reading.
   > 
   > cc @jonvex @yihua
   
   Same problem in `BaseFileOnlyRelation.scala`: 
https://github.com/apache/hudi/issues/9129 
https://github.com/apache/hudi/pull/10134 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to