(spark) branch master updated: [SPARK-55289][SQL] Fix flaky test in-set-operations.sql by disabling broadcast join

yao Mon, 02 Feb 2026 03:00:47 -0800

This is an automated email from the ASF dual-hosted git repository.

yao pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git



The following commit(s) were added to refs/heads/master by this push:
     new 3026926e0de3 [SPARK-55289][SQL] Fix flaky test in-set-operations.sql 
by disabling broadcast join
3026926e0de3 is described below

commit 3026926e0de39dbd79cc44947f5d2bc151149755
Author: Kent Yao <[email protected]>
AuthorDate: Mon Feb 2 19:00:27 2026 +0800

    [SPARK-55289][SQL] Fix flaky test in-set-operations.sql by disabling 
broadcast join
    
    ### What changes were proposed in this pull request?
    
    The SQLQueryTestSuite test `in-set-operations.sql` intermittently fails on 
GitHub Actions CI with `SparkOutOfMemoryError` (`UNABLE_TO_ACQUIRE_MEMORY`).
    
    **Root Cause:**
    The test runs complex queries with multiple UNIONs and correlated 
IN-subqueries. The physical plan contains 5 BroadcastHashJoin operations with 
nested BroadcastExchange, which accumulates hash tables in memory. On 
memory-constrained CI runners (~7GB RAM, 4GB JVM heap), this causes OOM under 
memory pressure.
    
    **Fix:**
    Add `--SET spark.sql.autoBroadcastJoinThreshold=-1` to the test file to 
disable broadcast joins. This forces SortMergeJoin which can spill to disk if 
needed, reducing peak memory usage.
    
    ### Why are the changes needed?
    
    To fix a flaky test that intermittently fails on CI due to memory pressure.
    
    ### Does this PR introduce _any_ user-facing change?
    
    No. This only affects test configuration.
    
    ### How was this patch tested?
    
    - Ran `./build/sbt "sql/testOnly org.apache.spark.sql.SQLQueryTestSuite -- 
-z in-set-operations.sql"` - all tests passed
    - The golden file is updated for minor row reordering where ORDER BY keys 
are equal (non-deterministic ordering for rows with identical sort keys)
    
    ### Was this patch authored or co-authored using generative AI tooling?
    
    Yes, GitHub Copilot CLI assisted with analysis and implementation.
    
    Closes #54072 from yaooqinn/SPARK-55289-clean.
    
    Authored-by: Kent Yao <[email protected]>
    Signed-off-by: Kent Yao <[email protected]>
---
 .../sql-tests/inputs/subquery/in-subquery/in-set-operations.sql         | 1 +
 .../sql-tests/results/subquery/in-subquery/in-set-operations.sql.out    | 2 +-
 2 files changed, 2 insertions(+), 1 deletion(-)

diff --git 
a/sql/core/src/test/resources/sql-tests/inputs/subquery/in-subquery/in-set-operations.sql
 
b/sql/core/src/test/resources/sql-tests/inputs/subquery/in-subquery/in-set-operations.sql
index c6b6a338c9b1..3df495e0524f 100644
--- 
a/sql/core/src/test/resources/sql-tests/inputs/subquery/in-subquery/in-set-operations.sql
+++ 
b/sql/core/src/test/resources/sql-tests/inputs/subquery/in-subquery/in-set-operations.sql
@@ -1,6 +1,7 @@
 -- A test suite for set-operations in parent side, subquery, and both 
predicate subquery
 -- It includes correlated cases.
 --ONLY_IF spark
+--SET spark.sql.autoBroadcastJoinThreshold=-1
 
 create temporary view t1 as select * from values
   ("val1a", 6S, 8, 10L, float(15.0), 20D, 20E2BD, timestamp '2014-04-04 
01:00:00.000', date '2014-04-04'),
diff --git 
a/sql/core/src/test/resources/sql-tests/results/subquery/in-subquery/in-set-operations.sql.out
 
b/sql/core/src/test/resources/sql-tests/results/subquery/in-subquery/in-set-operations.sql.out
index 13e1d8b56257..0669cdf0cf61 100644
--- 
a/sql/core/src/test/resources/sql-tests/results/subquery/in-subquery/in-set-operations.sql.out
+++ 
b/sql/core/src/test/resources/sql-tests/results/subquery/in-subquery/in-set-operations.sql.out
@@ -404,8 +404,8 @@ struct<t1a:string,t1b:smallint,t1c:int>
 val1d  NULL    16
 val1a  16      12
 val1e  10      NULL
-val1d  10      NULL
 val1e  10      NULL
+val1d  10      NULL
 val1b  8       16
 
 


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

(spark) branch master updated: [SPARK-55289][SQL] Fix flaky test in-set-operations.sql by disabling broadcast join

Reply via email to