[ 
https://issues.apache.org/jira/browse/SPARK-34346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17280700#comment-17280700
 ] 

Apache Spark commented on SPARK-34346:
--------------------------------------

User 'dongjoon-hyun' has created a pull request for this issue:
https://github.com/apache/spark/pull/31515

> io.file.buffer.size set by spark.buffer.size will override by hive-site.xml 
> may cause perf regression
> -----------------------------------------------------------------------------------------------------
>
>                 Key: SPARK-34346
>                 URL: https://issues.apache.org/jira/browse/SPARK-34346
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core, SQL
>    Affects Versions: 3.0.2, 3.1.1
>            Reporter: Kent Yao
>            Assignee: Kent Yao
>            Priority: Blocker
>             Fix For: 3.0.2, 3.1.1
>
>
> In many real-world cases, when interacting with hive catalog through Spark 
> SQL, users may just share the `hive-site.xml` for their hive jobs and make a 
> copy to `SPARK_HOME`/conf w/o modification. In Spark, when we generate Hadoop 
> configurations, we will use `spark.buffer.size(65536)` to reset  
> `io.file.buffer.size(4096)`. But when we load the hive-site.xml, we may 
> ignore this behavior and reset `io.file.buffer.size` again according to 
> `hive-site.xml`.
> 1. The configuration priority for setting Hadoop and Hive config here is not 
> right, while literally, the order should be `spark > spark.hive > 
> spark.hadoop > hive > hadoop`
> 2. This breaks `spark.buffer.size` congfig's behavior for tuning the IO 
> performance w/ HDFS if there is an existing `io.file.buffer.size` in 
> hive-site.xml 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to