[
https://issues.apache.org/jira/browse/SPARK-18966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15796762#comment-15796762
]
Nattavut Sutyanyong commented on SPARK-18966:
---------------------------------------------
Making a note that the query
{code}
select *
from t1
where a1 not in (select a2
from t2
where t2.b2 = t1.b1)
{code}
is not semantically equivalent to
{code}
select *
from t1
where (a1, b1) not in (select a2, b2
from t2)
{code}
> NOT IN subquery with correlated expressions may return incorrect result
> -----------------------------------------------------------------------
>
> Key: SPARK-18966
> URL: https://issues.apache.org/jira/browse/SPARK-18966
> Project: Spark
> Issue Type: Bug
> Components: SQL
> Affects Versions: 2.0.0
> Reporter: Nattavut Sutyanyong
> Labels: correctness
>
> {code}
> Seq((1, 2)).toDF("a1", "b1").createOrReplaceTempView("t1")
> Seq[(java.lang.Integer, java.lang.Integer)]((1, null)).toDF("a2",
> "b2").createOrReplaceTempView("t2")
> // The expected result is 1 row of (1,2) as shown in the next statement.
> sql("select * from t1 where a1 not in (select a2 from t2 where b2 = b1)").show
> +---+---+
> | a1| b1|
> +---+---+
> +---+---+
> sql("select * from t1 where a1 not in (select a2 from t2 where b2 = 2)").show
> +---+---+
> | a1| b1|
> +---+---+
> | 1| 2|
> +---+---+
> {code}
> The two SQL statements above should return the same result.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]