Manu Zhang created SPARK-32698:
----------------------------------
Summary: Do not fall back to default parallelism if the minimum
number of coalesced partitions is not set in AQE
Key: SPARK-32698
URL: https://issues.apache.org/jira/browse/SPARK-32698
Project: Spark
Issue Type: Improvement
Components: SQL
Affects Versions: 3.0.0
Reporter: Manu Zhang
Currently in AQE when coalescing shuffling partitions,
{quote}We fall back to Spark default parallelism if the minimum number of
coalesced partitions is not set, so to avoid perf regressions compared to no
coalescing.
{quote}
>From our experience, this has resulted in a lot of uncertainty of the number
>of tasks after coalescing especially with dynamic allocation, and also lead to
>many small output files. It's complex and hard to reason about.
Hence, I'm proposing not falling back to the default parallelism but coalescing
towards the target size when the minimum number of coalesced partitions is not
set.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]