Frederick Reiss created SPARK-15370:
---------------------------------------

             Summary: Some correlated subqueries return incorrect answers
                 Key: SPARK-15370
                 URL: https://issues.apache.org/jira/browse/SPARK-15370
             Project: Spark
          Issue Type: Bug
          Components: SQL
    Affects Versions: 2.0.0
            Reporter: Frederick Reiss


The rewrite introduced in SPARK-14785 has the COUNT bug. The rewrite changes 
the semantics of some correlated subqueries when there are tuples from the 
outer query block that do not join with the subquery. For example:
{noformat}
spark-sql> create table R(a integer) as values (1);
spark-sql> create table S(b integer);
spark-sql> select R.a from R 
         >     where (select count(*) from S where R.a = S.b) = 0;
Time taken: 2.139 seconds                                                       
spark-sql> 
(returns zero rows; the answer should be one row of '1')
{noformat}
This problem also affects the SELECT clause:
{noformat}
spark-sql> select R.a, 
         >     (select count(*) from S where R.a = S.b) as cnt 
         > from R;
1       NULL
(the answer should be "1 0")
{noformat}
Some subqueries with COUNT aggregates are *not* affected:
{noformat}
spark-sql> select R.a from R 
         >     where (select count(*) from S where R.a = S.b) > 0;
Time taken: 0.609 seconds
spark-sql>
(Correct answer)

spark-sql> select R.a from R 
         >     where (select count(*) + sum(S.b) from S where R.a = S.b) = 0;
Time taken: 0.553 seconds
spark-sql> 
(Correct answer)
{noformat}

Other cases can trigger the variant of the COUNT bug for expressions involving 
NULL checks:
{noformat}
spark-sql> select R.a from R 
         > where (select sum(S.b) is null from S where R.a = S.b);
(returns zero rows, should return one row)
{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to