(spark) branch master updated: [SPARK-46760][SQL][DOCS] Make the document of spark.sql.adaptive.coalescePartitions.parallelismFirst clearer

srowen Sat, 03 Feb 2024 07:07:10 -0800

This is an automated email from the ASF dual-hosted git repository.

srowen pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git



The following commit(s) were added to refs/heads/master by this push:
     new 9d4d41c43f1c [SPARK-46760][SQL][DOCS] Make the document of 
spark.sql.adaptive.coalescePartitions.parallelismFirst clearer
9d4d41c43f1c is described below

commit 9d4d41c43f1cb4cf724e0e27c1762df8bbdf2a54
Author: beliefer <[email protected]>
AuthorDate: Sat Feb 3 09:06:38 2024 -0600

    [SPARK-46760][SQL][DOCS] Make the document of 
spark.sql.adaptive.coalescePartitions.parallelismFirst clearer
    
    ### What changes were proposed in this pull request?
    This PR propose to make the document of 
`spark.sql.adaptive.coalescePartitions.parallelismFirst` clearer.
    
    ### Why are the changes needed?
    The default value of 
`spark.sql.adaptive.coalescePartitions.parallelismFirst` is true, but the 
document contains the word `recommended to set this config to false and respect 
the configured target size`. It's very confused.
    
    ### Does this PR introduce _any_ user-facing change?
    'Yes'.
    The document is more clear.
    
    ### How was this patch tested?
    N/A
    
    ### Was this patch authored or co-authored using generative AI tooling?
    'No'.
    
    Closes #44787 from beliefer/SPARK-46760.
    
    Authored-by: beliefer <[email protected]>
    Signed-off-by: Sean Owen <[email protected]>
---
 docs/sql-performance-tuning.md                                       | 2 +-
 .../src/main/scala/org/apache/spark/sql/internal/SQLConf.scala       | 5 +++--
 2 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/docs/sql-performance-tuning.md b/docs/sql-performance-tuning.md
index 1dbe1bb7e1a2..25c22d660562 100644
--- a/docs/sql-performance-tuning.md
+++ b/docs/sql-performance-tuning.md
@@ -267,7 +267,7 @@ This feature coalesces the post shuffle partitions based on 
the map output stati
      
<td><code>spark.sql.adaptive.coalescePartitions.parallelismFirst</code></td>
      <td>true</td>
      <td>
-       When true, Spark ignores the target size specified by 
<code>spark.sql.adaptive.advisoryPartitionSizeInBytes</code> (default 64MB) 
when coalescing contiguous shuffle partitions, and only respect the minimum 
partition size specified by 
<code>spark.sql.adaptive.coalescePartitions.minPartitionSize</code> (default 
1MB), to maximize the parallelism. This is to avoid performance regression when 
enabling adaptive query execution. It's recommended to set this config to false 
and respect th [...]
+       When true, Spark ignores the target size specified by 
<code>spark.sql.adaptive.advisoryPartitionSizeInBytes</code> (default 64MB) 
when coalescing contiguous shuffle partitions, and only respect the minimum 
partition size specified by 
<code>spark.sql.adaptive.coalescePartitions.minPartitionSize</code> (default 
1MB), to maximize the parallelism. This is to avoid performance regressions 
when enabling adaptive query execution. It's recommended to set this config to 
true on a busy clus [...]
      </td>
      <td>3.2.0</td>
    </tr>
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
index d88cbed6b27d..1bff0ff1a350 100644
--- a/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
+++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
@@ -713,8 +713,9 @@ object SQLConf {
         "shuffle partitions, but adaptively calculate the target size 
according to the default " +
         "parallelism of the Spark cluster. The calculated size is usually 
smaller than the " +
         "configured target size. This is to maximize the parallelism and avoid 
performance " +
-        "regression when enabling adaptive query execution. It's recommended 
to set this config " +
-        "to false and respect the configured target size.")
+        "regressions when enabling adaptive query execution. It's recommended 
to set this " +
+        "config to true on a busy cluster to make resource utilization more 
efficient (not many " +
+        "small tasks).")
       .version("3.2.0")
       .booleanConf
       .createWithDefault(true)


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

(spark) branch master updated: [SPARK-46760][SQL][DOCS] Make the document of spark.sql.adaptive.coalescePartitions.parallelismFirst clearer

Reply via email to