[ 
https://issues.apache.org/jira/browse/SPARK-55289?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-55289:
-----------------------------------
    Labels: pull-request-available  (was: )

> [SQL] Fix flaky test in-set-operations.sql by disabling broadcast join
> ----------------------------------------------------------------------
>
>                 Key: SPARK-55289
>                 URL: https://issues.apache.org/jira/browse/SPARK-55289
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 4.2.0
>            Reporter: Kent Yao
>            Priority: Major
>              Labels: pull-request-available
>
> The SQLQueryTestSuite test 'in-set-operations.sql' intermittently fails on 
> GitHub Actions CI with SparkOutOfMemoryError (UNABLE_TO_ACQUIRE_MEMORY).
> **Root Cause:**
> The test runs complex queries with multiple UNIONs and correlated 
> IN-subqueries. The physical plan contains 5 BroadcastHashJoin operations with 
> nested BroadcastExchange, which accumulates hash tables in memory. On 
> memory-constrained CI runners (~7GB RAM, 4GB JVM heap), this causes OOM under 
> memory pressure.
> **Fix:**
> Add `--SET spark.sql.autoBroadcastJoinThreshold=-1` to the test file to 
> disable broadcast joins. This forces SortMergeJoin which can spill to disk if 
> needed, reducing peak memory usage.
> **Impact:**
> - The test still validates SQL correctness (same logical results)
> - Only the physical execution strategy changes (broadcast → shuffle)
> - The golden file is updated for minor row reordering where ORDER BY keys are 
> equal (non-deterministic ordering)
> **Files Changed:**
> - 
> sql/core/src/test/resources/sql-tests/inputs/subquery/in-subquery/in-set-operations.sql
> - 
> sql/core/src/test/resources/sql-tests/results/subquery/in-subquery/in-set-operations.sql.out



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to