This is an automated email from the ASF dual-hosted git repository.
yao pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git
The following commit(s) were added to refs/heads/master by this push:
new 3026926e0de3 [SPARK-55289][SQL] Fix flaky test in-set-operations.sql
by disabling broadcast join
3026926e0de3 is described below
commit 3026926e0de39dbd79cc44947f5d2bc151149755
Author: Kent Yao <[email protected]>
AuthorDate: Mon Feb 2 19:00:27 2026 +0800
[SPARK-55289][SQL] Fix flaky test in-set-operations.sql by disabling
broadcast join
### What changes were proposed in this pull request?
The SQLQueryTestSuite test `in-set-operations.sql` intermittently fails on
GitHub Actions CI with `SparkOutOfMemoryError` (`UNABLE_TO_ACQUIRE_MEMORY`).
**Root Cause:**
The test runs complex queries with multiple UNIONs and correlated
IN-subqueries. The physical plan contains 5 BroadcastHashJoin operations with
nested BroadcastExchange, which accumulates hash tables in memory. On
memory-constrained CI runners (~7GB RAM, 4GB JVM heap), this causes OOM under
memory pressure.
**Fix:**
Add `--SET spark.sql.autoBroadcastJoinThreshold=-1` to the test file to
disable broadcast joins. This forces SortMergeJoin which can spill to disk if
needed, reducing peak memory usage.
### Why are the changes needed?
To fix a flaky test that intermittently fails on CI due to memory pressure.
### Does this PR introduce _any_ user-facing change?
No. This only affects test configuration.
### How was this patch tested?
- Ran `./build/sbt "sql/testOnly org.apache.spark.sql.SQLQueryTestSuite --
-z in-set-operations.sql"` - all tests passed
- The golden file is updated for minor row reordering where ORDER BY keys
are equal (non-deterministic ordering for rows with identical sort keys)
### Was this patch authored or co-authored using generative AI tooling?
Yes, GitHub Copilot CLI assisted with analysis and implementation.
Closes #54072 from yaooqinn/SPARK-55289-clean.
Authored-by: Kent Yao <[email protected]>
Signed-off-by: Kent Yao <[email protected]>
---
.../sql-tests/inputs/subquery/in-subquery/in-set-operations.sql | 1 +
.../sql-tests/results/subquery/in-subquery/in-set-operations.sql.out | 2 +-
2 files changed, 2 insertions(+), 1 deletion(-)
diff --git
a/sql/core/src/test/resources/sql-tests/inputs/subquery/in-subquery/in-set-operations.sql
b/sql/core/src/test/resources/sql-tests/inputs/subquery/in-subquery/in-set-operations.sql
index c6b6a338c9b1..3df495e0524f 100644
---
a/sql/core/src/test/resources/sql-tests/inputs/subquery/in-subquery/in-set-operations.sql
+++
b/sql/core/src/test/resources/sql-tests/inputs/subquery/in-subquery/in-set-operations.sql
@@ -1,6 +1,7 @@
-- A test suite for set-operations in parent side, subquery, and both
predicate subquery
-- It includes correlated cases.
--ONLY_IF spark
+--SET spark.sql.autoBroadcastJoinThreshold=-1
create temporary view t1 as select * from values
("val1a", 6S, 8, 10L, float(15.0), 20D, 20E2BD, timestamp '2014-04-04
01:00:00.000', date '2014-04-04'),
diff --git
a/sql/core/src/test/resources/sql-tests/results/subquery/in-subquery/in-set-operations.sql.out
b/sql/core/src/test/resources/sql-tests/results/subquery/in-subquery/in-set-operations.sql.out
index 13e1d8b56257..0669cdf0cf61 100644
---
a/sql/core/src/test/resources/sql-tests/results/subquery/in-subquery/in-set-operations.sql.out
+++
b/sql/core/src/test/resources/sql-tests/results/subquery/in-subquery/in-set-operations.sql.out
@@ -404,8 +404,8 @@ struct<t1a:string,t1b:smallint,t1c:int>
val1d NULL 16
val1a 16 12
val1e 10 NULL
-val1d 10 NULL
val1e 10 NULL
+val1d 10 NULL
val1b 8 16
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]