Allison Wang created SPARK-35080:
------------------------------------

             Summary: Correlated subqueries with equality predicates can return 
wrong results
                 Key: SPARK-35080
                 URL: https://issues.apache.org/jira/browse/SPARK-35080
             Project: Spark
          Issue Type: Bug
          Components: SQL
    Affects Versions: 3.2.0
            Reporter: Allison Wang


Correlated subqueries with aggregate can return wrong results even with 
correlated equality predicates. Currently, the decorrelation framework does not 
support these types of correlated subqueries, and they should be blocked in 
CheckAnalysis.

Example 1:
{code:sql}
create or replace view t1(c) as values ('a'), ('b')
create or replace view t2(c) as values ('ab'), ('abc'), ('bc')

select c, (select count(*) from t2 where t1.c = substring(t2.c, 1, 1)) from t1
{code}
Correct results: [(a, 2), (b, 1)]
 Spark results:
{code:java}
+---+-----------------+
|c  |scalarsubquery(c)|
+---+-----------------+
|a  |1                |
|a  |1                |
|b  |1                |
+---+-----------------+{code}
Example 2:
{code:sql}
create or replace view t1(a, b) as values (0, 6), (1, 5), (2, 4), (3, 3);
create or replace view t2(c) as values (6);

select c, (select count(*) from t1 where a + b = c) from t2;{code}
Correct results: [(6, 4)]
 Spark results:
{code:java}
+---+-----------------+
|c  |scalarsubquery(c)|
+---+-----------------+
|6  |1                |
|6  |1                |
|6  |1                |
|6  |1                |
+---+-----------------+
{code}

  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to