This is an automated email from the ASF dual-hosted git repository.

yaooqinn pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
     new 706b6a39b187 [SPARK-56948][SQL][TESTS] Make TPCDSQueryBenchmark 
heap/broadcast configurable
706b6a39b187 is described below

commit 706b6a39b1876cc888040e65fa192de64616bab0
Author: Kent Yao <[email protected]>
AuthorDate: Wed May 20 12:29:02 2026 +0800

    [SPARK-56948][SQL][TESTS] Make TPCDSQueryBenchmark heap/broadcast 
configurable
    
    ### What changes were proposed in this pull request?
    
    Switch hardcoded `.set(...)` to `.setIfMissing(...)` for three SparkConf
    keys in `TPCDSQueryBenchmark`:
    
    - `spark.driver.memory`
    - `spark.executor.memory`
    - `spark.sql.autoBroadcastJoinThreshold`
    
    Also unify `spark.sql.shuffle.partitions` to use `setIfMissing` for
    consistency (functionally equivalent to the existing
    `System.getProperty` form).
    
    ### Why are the changes needed?
    
    `.set(...)` overrides any `-Dspark.*` JVM property, so users can't
    tune heap/broadcast threshold without editing source. At SF10 / SF100
    the hardcoded 3g heap OOMs. `spark.sql.shuffle.partitions` already
    supported override in the same file — this extends the same pattern
    to the remaining three keys.
    
    ### Does this PR introduce _any_ user-facing change?
    
    No. Defaults unchanged.
    
    ### How was this patch tested?
    
    Verified locally that `-Dspark.driver.memory=72g` (etc.) flow through
    to the SparkConf when launched via:
    
    ```
    build/sbt -Dspark.driver.memory=72g \
              -Dspark.executor.memory=72g \
              -Dspark.sql.autoBroadcastJoinThreshold=10485760 \
              -Dspark.sql.shuffle.partitions=512 \
              "sql/Test/runMain ...TPCDSQueryBenchmark --data-location ..."
    ```
    
    Without these flags, defaults remain `3g / 3g / 20MB / 4`.
    
    ### Was this patch authored or co-authored using generative AI tooling?
    
    Generated-by: Claude Opus 4.7
    
    Closes #55988 from yaooqinn/SPARK-56948.
    
    Authored-by: Kent Yao <[email protected]>
    Signed-off-by: Kent Yao <[email protected]>
---
 .../spark/sql/execution/benchmark/TPCDSQueryBenchmark.scala       | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git 
a/sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/TPCDSQueryBenchmark.scala
 
b/sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/TPCDSQueryBenchmark.scala
index c79f9f26d60d..c1ff0eb8458d 100644
--- 
a/sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/TPCDSQueryBenchmark.scala
+++ 
b/sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/TPCDSQueryBenchmark.scala
@@ -51,10 +51,10 @@ object TPCDSQueryBenchmark extends SqlBasedBenchmark with 
Logging {
     val conf = new SparkConf()
       .setMaster(System.getProperty("spark.sql.test.master", "local[1]"))
       .setAppName("test-sql-context")
-      .set("spark.sql.shuffle.partitions", 
System.getProperty("spark.sql.shuffle.partitions", "4"))
-      .set("spark.driver.memory", "3g")
-      .set("spark.executor.memory", "3g")
-      .set("spark.sql.autoBroadcastJoinThreshold", (20 * 1024 * 1024).toString)
+      .setIfMissing("spark.sql.shuffle.partitions", "4")
+      .setIfMissing("spark.driver.memory", "3g")
+      .setIfMissing("spark.executor.memory", "3g")
+      .setIfMissing("spark.sql.autoBroadcastJoinThreshold", (20 * 1024 * 
1024).toString)
       .set("spark.sql.crossJoin.enabled", "true")
       .set("spark.serializer", "org.apache.spark.serializer.KryoSerializer")
       .set("spark.kryo.registrationRequired", "true")


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to