(incubator-gluten) branch main updated: [VL] Minor: Change sort shuffle partition threshold to 4000 (#9866)

yuanzhou Wed, 04 Jun 2025 18:51:27 -0700

This is an automated email from the ASF dual-hosted git repository.

yuanzhou pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/incubator-gluten.git



The following commit(s) were added to refs/heads/main by this push:
     new 33c3fb16ea [VL] Minor: Change sort shuffle partition threshold to 4000 
(#9866)
33c3fb16ea is described below

commit 33c3fb16eae4e0c30ecdfcc2adb36c6e0e1ed0cd
Author: Rong Ma <[email protected]>
AuthorDate: Thu Jun 5 02:42:15 2025 +0100

    [VL] Minor: Change sort shuffle partition threshold to 4000 (#9866)
---
 docs/Configuration.md                                                   | 2 +-
 shims/common/src/main/scala/org/apache/gluten/config/GlutenConfig.scala | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/docs/Configuration.md b/docs/Configuration.md
index 4ab9285fac..8fb73131c9 100644
--- a/docs/Configuration.md
+++ b/docs/Configuration.md
@@ -43,7 +43,7 @@ You can add these configurations into spark-defaults.conf to 
enable or disable t
 | spark.gluten.sql.columnar.tableCache                          | Enable or 
Disable Columnar Table Cache, default is false                                  
                                                                                
                                                                                
                                                                                
                                                                                
                 [...]
 | spark.gluten.sql.columnar.broadcastExchange                   | Enable or 
Disable Columnar Broadcast Exchange, default is true                            
                                                                                
                                                                                
                                                                                
                                                                                
                 [...]
 | spark.gluten.sql.columnar.broadcastJoin                       | Enable or 
Disable Columnar BroadcastHashJoin, default is true                             
                                                                                
                                                                                
                                                                                
                                                                                
                 [...]
-| spark.gluten.sql.columnar.shuffle.sort.partitions.threshold   | The 
threshold to determine whether to use sort-based columnar shuffle. Sort-based 
shuffle will be used if the number of partitions is greater than this 
threshold.                                                                      
                                                                                
                                                                                
                                   [...]
+| spark.gluten.sql.columnar.shuffle.sort.partitions.threshold   | The 
threshold to determine whether to use sort-based columnar shuffle. Sort-based 
shuffle will be used if the number of partitions is greater than this 
threshold.                                                                      
                                                                                
                                                                                
                                   [...]
 | spark.gluten.sql.columnar.shuffle.sort.columns.threshold      | The 
threshold to determine whether to use sort-based columnar shuffle. Sort-based 
shuffle will be used if the number of columns is greater than this threshold.   
                                                                                
                                                                                
                                                                                
                         [...]
 | spark.gluten.sql.columnar.shuffle.codec                       | Set up the 
codec to be used for Columnar Shuffle. If this configuration is not set, will 
check the value of spark.io.compression.codec. By default, Gluten use software 
compression. Valid options for software compression are lz4, zstd. Valid 
options for QAT and IAA is gzip.                                                
                                                                                
                          [...]
 | spark.gluten.sql.columnar.shuffle.codecBackend                | Enable using 
hardware accelerators for shuffle de/compression. Valid options are QAT and 
IAA.                                                                            
                                                                                
                                                                                
                                                                                
                  [...]
diff --git 
a/shims/common/src/main/scala/org/apache/gluten/config/GlutenConfig.scala 
b/shims/common/src/main/scala/org/apache/gluten/config/GlutenConfig.scala
index dbf783de13..9f2c81bd40 100644
--- a/shims/common/src/main/scala/org/apache/gluten/config/GlutenConfig.scala
+++ b/shims/common/src/main/scala/org/apache/gluten/config/GlutenConfig.scala
@@ -957,7 +957,7 @@ object GlutenConfig {
       .doc("The threshold to determine whether to use sort-based columnar 
shuffle. Sort-based " +
         "shuffle will be used if the number of partitions is greater than this 
threshold.")
       .intConf
-      .createWithDefault(100000)
+      .createWithDefault(4000)
 
   val COLUMNAR_SHUFFLE_SORT_COLUMNS_THRESHOLD =
     buildConf("spark.gluten.sql.columnar.shuffle.sort.columns.threshold")


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

(incubator-gluten) branch main updated: [VL] Minor: Change sort shuffle partition threshold to 4000 (#9866)

Reply via email to