xiarixiaoyao commented on code in PR #10134:
URL: https://github.com/apache/hudi/pull/10134#discussion_r1398156052
##########
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/BaseFileOnlyRelation.scala:
##########
@@ -81,7 +81,9 @@ case class BaseFileOnlyRelation(override val sqlContext:
SQLContext,
super.imbueConfigs(sqlContext)
// TODO Issue with setting this to true in spark 332
if (HoodieSparkUtils.gteqSpark3_4 || !HoodieSparkUtils.gteqSpark3_3_2) {
-
sqlContext.sparkSession.sessionState.conf.setConfString("spark.sql.parquet.enableVectorizedReader",
"true")
Review Comment:
should not use contains.
spark always inject this config.
The reason why this place is forcibly set to true is because the parent
class of BaseFileOnlyRelationship previously forced set
spark.sql.parquet.enableVectorizedReader=false. users unable to turn on
vectorization, resulting in a significant decrease in query performance. So at
that time, it was forcibly set to true in this location.
however https://issues.apache.org/jira/browse/HUDI-3639 This PR removes the
behavior of the parent class of BaseFileOnlyRelationship.
Therefore, we no longer need to force set
spark.sql.parquet.enableVectorizedReader=true in BaseFileOnlyRelationship;
Whether to enable vectorization is left to the user for decision
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]