[ https://issues.apache.org/jira/browse/SPARK-45894?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
ASF GitHub Bot updated SPARK-45894: ----------------------------------- Labels: pull-request-available (was: ) > hive table level setting hadoop.mapred.max.split.size > ----------------------------------------------------- > > Key: SPARK-45894 > URL: https://issues.apache.org/jira/browse/SPARK-45894 > Project: Spark > Issue Type: Improvement > Components: SQL > Affects Versions: 3.5.0 > Reporter: guihuawen > Priority: Major > Labels: pull-request-available > Fix For: 3.5.0 > > > In the scenario of hive table scan, by configuring the > hadoop.mapred.max.split.size parameter, you can increase the parallelism of > the scan hive table stage, thereby reducing the running time. > However, if a large table and a small table are in the same query, if only a > separate hadoop.mapred.max.split.size parameter is configured, some stages > will run a very large number of tasks, and some stages will The number of > tasks running is very small. For runtime tasks, the > hadoop.mapred.max.split.size parameter can be set separately for each hive > table to ensure this balance. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org