Mihailo Timotic created SPARK-52531:
---------------------------------------
Summary: `OuterReference` in subquery aggregate is incorrectly
tied to outer query aggregate
Key: SPARK-52531
URL: https://issues.apache.org/jira/browse/SPARK-52531
Project: Spark
Issue Type: Bug
Components: SQL
Affects Versions: 4.1.0
Reporter: Mihailo Timotic
For a query like:
{{}}
{code:java}
{code}
{{SELECT MAX(a.col1) }}
{{FROM VALUES (7) AS a(col1) }}
{{GROUP BY a.col1 }}
{{HAVING COUNT(*) = ( }}
{{ SELECT COUNT(*) }}
{{ FROM VALUES (7),(7),(8),(8) AS c(col1) }}
{{ WHERE c.col1 >= a.col1 }}
{{ GROUP BY c.col1 }}
{{ LIMIT 1 }}
{{) }}
{{ORDER BY a.col1 DESC;}}
Spark outputs result: `7`
{{}}
However, this is an incorrect result and both MySQL and DuckDB correctly output
0 rows
!image-2025-06-19-10-08-02-417.png!
!image-2025-06-19-10-08-15-034.png!
This is because count(*) from the subquery under Having is incorrectly tied as
an outer reference to the count(*) from the outer query:
{{}}
{code:java}
Project [max(col1)#3081747] +- Sort [col1#3081725 DESC NULLS LAST], true +-
Project [max(col1)#3081747, col1#3081725] +- Filter (count(1)#3081750L =
scalar-subquery#3081727 [col1#3081725]) : +- GlobalLimit 1 :
+- LocalLimit 1 : +- Aggregate [col1#3081726],
[outer(count(1)#3081750L) AS count(1)#3081749L] : +- Filter
(col1#3081726 >= outer(col1#3081725)) : +- SubqueryAlias
c : +- LocalRelation [col1#3081726] +-
Aggregate [col1#3081725], [max(col1#3081725) AS max(col1)#3081747, count(1) AS
count(1)#3081750L, col1#3081725] +- SubqueryAlias a
+- LocalRelation [col1#3081725]{code}
The outer reference on count(*) is introduced in UpdateOuterReferences rule.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]