[
https://issues.apache.org/jira/browse/SPARK-55289?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
ASF GitHub Bot updated SPARK-55289:
-----------------------------------
Labels: pull-request-available (was: )
> [SQL] Fix flaky test in-set-operations.sql by disabling broadcast join
> ----------------------------------------------------------------------
>
> Key: SPARK-55289
> URL: https://issues.apache.org/jira/browse/SPARK-55289
> Project: Spark
> Issue Type: Bug
> Components: SQL
> Affects Versions: 4.2.0
> Reporter: Kent Yao
> Priority: Major
> Labels: pull-request-available
>
> The SQLQueryTestSuite test 'in-set-operations.sql' intermittently fails on
> GitHub Actions CI with SparkOutOfMemoryError (UNABLE_TO_ACQUIRE_MEMORY).
> **Root Cause:**
> The test runs complex queries with multiple UNIONs and correlated
> IN-subqueries. The physical plan contains 5 BroadcastHashJoin operations with
> nested BroadcastExchange, which accumulates hash tables in memory. On
> memory-constrained CI runners (~7GB RAM, 4GB JVM heap), this causes OOM under
> memory pressure.
> **Fix:**
> Add `--SET spark.sql.autoBroadcastJoinThreshold=-1` to the test file to
> disable broadcast joins. This forces SortMergeJoin which can spill to disk if
> needed, reducing peak memory usage.
> **Impact:**
> - The test still validates SQL correctness (same logical results)
> - Only the physical execution strategy changes (broadcast → shuffle)
> - The golden file is updated for minor row reordering where ORDER BY keys are
> equal (non-deterministic ordering)
> **Files Changed:**
> -
> sql/core/src/test/resources/sql-tests/inputs/subquery/in-subquery/in-set-operations.sql
> -
> sql/core/src/test/resources/sql-tests/results/subquery/in-subquery/in-set-operations.sql.out
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]