[ 
https://issues.apache.org/jira/browse/SPARK-45894?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-45894:
-----------------------------------
    Labels: pull-request-available  (was: )

> hive table level setting hadoop.mapred.max.split.size
> -----------------------------------------------------
>
>                 Key: SPARK-45894
>                 URL: https://issues.apache.org/jira/browse/SPARK-45894
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 3.5.0
>            Reporter: guihuawen
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 3.5.0
>
>
> In the scenario of hive table scan, by configuring the 
> hadoop.mapred.max.split.size parameter, you can increase the parallelism of 
> the scan hive table stage, thereby reducing the running time.
> However, if a large table and a small table are in the same query, if only a 
> separate hadoop.mapred.max.split.size parameter is configured, some stages 
> will run a very large number of tasks, and some stages will The number of 
> tasks running is very small. For runtime tasks, the 
> hadoop.mapred.max.split.size parameter can be set separately for each hive 
> table to ensure this balance.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to