[ https://issues.apache.org/jira/browse/SPARK-19017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15785417#comment-15785417 ]
Nattavut Sutyanyong commented on SPARK-19017: --------------------------------------------- Using your interpretation, (2,1) not in (2,0) would be evaluated to false. Spark returns (2,1). So do many other SQL engines. > NOT IN subquery with more than one column may return incorrect results > ---------------------------------------------------------------------- > > Key: SPARK-19017 > URL: https://issues.apache.org/jira/browse/SPARK-19017 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 2.0.0, 2.0.1, 2.0.2, 2.1.0 > Reporter: Nattavut Sutyanyong > > When putting more than one column in the NOT IN, the query may not return > correctly if there is a null data. We can demonstrate the problem with the > following data set and query: > {code} > Seq((2,1)).toDF("a1","b1").createOrReplaceTempView("t1") > Seq[(java.lang.Integer,java.lang.Integer)]((1,null)).toDF("a2","b2").createOrReplaceTempView("t2") > sql("select * from t1 where (a1,b1) not in (select a2,b2 from t2)").show > +---+---+ > | a1| b1| > +---+---+ > +---+---+ > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org