TheR1sing3un commented on code in PR #14161:
URL: https://github.com/apache/hudi/pull/14161#discussion_r2587264427
##########
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/HoodieFileGroupReaderBasedFileFormat.scala:
##########
@@ -202,6 +214,8 @@ class HoodieFileGroupReaderBasedFileFormat(tablePath:
String,
val requestedAvroSchema = AvroSchemaUtils.pruneDataSchema(avroTableSchema,
AvroConversionUtils.convertStructTypeToAvroSchema(requestedSchema,
sanitizedTableName), exclusionFields)
val dataAvroSchema = AvroSchemaUtils.pruneDataSchema(avroTableSchema,
AvroConversionUtils.convertStructTypeToAvroSchema(dataSchema,
sanitizedTableName), exclusionFields)
+
spark.sessionState.conf.setConfString("spark.sql.parquet.enableVectorizedReader",
supportVectorizedRead.toString)
Review Comment:
> A friendly reminder: If we modify this configuration in the conf of spark
sessionState in the hudi logic, it may disrupt the read logic of other
datasources. For example, if this configuration is initially set to true, When
a spark sql reads a hudi table and another datasource table such as a hive
table, the behavior we hope for is that whether the hudi performs vectorized
reading is controlled by the hudi logic itself, while hive directly performs
vectorized reading. However, if we change this configuration here, perhaps this
will lead to hive not performing vectorized reading.
>
> cc @jonvex @yihua
Same problem in `BaseFileOnlyRelation.scala`:
https://github.com/apache/hudi/issues/9129
https://github.com/apache/hudi/pull/10134
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]