TheR1sing3un commented on code in PR #14161:
URL: https://github.com/apache/hudi/pull/14161#discussion_r2587242503
##########
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/HoodieFileGroupReaderBasedFileFormat.scala:
##########
@@ -202,6 +214,8 @@ class HoodieFileGroupReaderBasedFileFormat(tablePath:
String,
val requestedAvroSchema = AvroSchemaUtils.pruneDataSchema(avroTableSchema,
AvroConversionUtils.convertStructTypeToAvroSchema(requestedSchema,
sanitizedTableName), exclusionFields)
val dataAvroSchema = AvroSchemaUtils.pruneDataSchema(avroTableSchema,
AvroConversionUtils.convertStructTypeToAvroSchema(dataSchema,
sanitizedTableName), exclusionFields)
+
spark.sessionState.conf.setConfString("spark.sql.parquet.enableVectorizedReader",
supportVectorizedRead.toString)
Review Comment:
A friendly reminder:
If we modify this configuration in the conf of spark sessionState in the
hudi logic, it may disrupt the read logic of other datasources.
For example, if this configuration is initially set to true, When a spark
sql reads a hudi table and another datasource table such as a hive table, the
behavior we hope for is that whether the hudi performs vectorized reading is
controlled by the hudi logic itself, while hive directly performs vectorized
reading.
However, if we change this configuration here, perhaps this will lead to
hive not performing vectorized reading.
A similar issue was encountered in the previous `BaseRelation.java`.
cc @jonvex @yihua
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]