Nattavut Sutyanyong created SPARK-17348:
-------------------------------------------

             Summary: Incorrect results from subquery transformation
                 Key: SPARK-17348
                 URL: https://issues.apache.org/jira/browse/SPARK-17348
             Project: Spark
          Issue Type: Bug
          Components: SQL
    Affects Versions: 2.0.0
            Reporter: Nattavut Sutyanyong


{noformat}
Seq((1,1)).toDF("c1","c2").createOrReplaceTempView("t1")
Seq((1,1),(2,0)).toDF("c1","c2").createOrReplaceTempView("t2")
sql("select c1 from t1 where c1 in (select max(t2.c1) from t2 where t1.c2 >= 
t2.c2)").show

+---+
| c1|
+---+
|  1|
+---+
{noformat}

The correct result of the above query should be an empty set. Here is an 
explanation:

Both rows from T2 satisfies the correlated predicate T1.C2 >= T2.C2 when T1.C1 
= 1 so both rows needs to be processed in the same group of the aggregation 
process in the subquery. The result of the aggregation yields MAX(T2.C1) as 2. 
Therefore, the result of the evaluation of the predicate T1.C1 (which is 1) IN 
MAX(T2.C1) (which is 2) should be an empty set.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to