Re: [PR] [HUDI-7118] set conf 'spark.sql.parquet.enableVectorizedReader' to true automatically only if the value is not explicitly set [hudi]

via GitHub Sat, 18 Nov 2023 00:51:47 -0800


xiarixiaoyao commented on code in PR #10134:
URL: https://github.com/apache/hudi/pull/10134#discussion_r1398156052



##########
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/BaseFileOnlyRelation.scala:
##########
@@ -81,7 +81,9 @@ case class BaseFileOnlyRelation(override val sqlContext: 
SQLContext,
     super.imbueConfigs(sqlContext)
     // TODO Issue with setting this to true in spark 332
     if (HoodieSparkUtils.gteqSpark3_4 || !HoodieSparkUtils.gteqSpark3_3_2) {
-      
sqlContext.sparkSession.sessionState.conf.setConfString("spark.sql.parquet.enableVectorizedReader",
 "true")

Review Comment:
   should not use contains. 
   spark always inject this config.
   
   The reason why this place is forcibly set to true is because the parent 
class of BaseFileOnlyRelationship previously forced set 
spark.sql.parquet.enableVectorizedReader=false.  users unable to turn on 
vectorization, resulting in a significant decrease in query performance. So at 
that time, it was forcibly set to true in this location.
   
   however https://issues.apache.org/jira/browse/HUDI-3639 This PR removes the 
behavior of the parent class of BaseFileOnlyRelationship.
   Therefore, we no longer need to force set 
spark.sql.parquet.enableVectorizedReader=true in BaseFileOnlyRelationship; 
Whether to enable vectorization is left to the user for decision
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] [HUDI-7118] set conf 'spark.sql.parquet.enableVectorizedReader' to true automatically only if the value is not explicitly set [hudi]

Reply via email to