[ 
https://issues.apache.org/jira/browse/SPARK-19017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15799369#comment-15799369
 ] 

Herman van Hovell commented on SPARK-19017:
-------------------------------------------

I agree that they are equal.

It just seems weird to me that in some cases it is ok that the tuples in the 
subquery can have null values. That being said, I am convinced that your 
approach is correct.

I have also checked if comparing structs returns the same results as the 
underlying fields. In some cases it does not:
{noformat}
scala> sql("select (2, 2) <> (2, cast(null as int)) as c1, 2 <> 1 or 2 <> 
cast(null as int) as c2").show
+----+----+
|  c1|  c2|
+----+----+
|true|true|
+----+----+

scala> sql("select (1, 2) <> (2, cast(null as int)) as c1, 1 <> 1 or 2 <> 
cast(null as int) as c2").show
+----+----+
|  c1|  c2|
+----+----+
|true|null| <-- Result for struct is wrong.
+----+----+
{noformat}
We fortunately do not use this, but this is still a bug.

> NOT IN subquery with more than one column may return incorrect results
> ----------------------------------------------------------------------
>
>                 Key: SPARK-19017
>                 URL: https://issues.apache.org/jira/browse/SPARK-19017
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 2.0.0, 2.0.1, 2.0.2, 2.1.0
>            Reporter: Nattavut Sutyanyong
>
> When putting more than one column in the NOT IN, the query may not return 
> correctly if there is a null data. We can demonstrate the problem with the 
> following data set and query:
> {code}
> Seq((2,1)).toDF("a1","b1").createOrReplaceTempView("t1")
> Seq[(java.lang.Integer,java.lang.Integer)]((1,null)).toDF("a2","b2").createOrReplaceTempView("t2")
> sql("select * from t1 where (a1,b1) not in (select a2,b2 from t2)").show
> +---+---+
> | a1| b1|
> +---+---+
> +---+---+
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to