[ 
https://issues.apache.org/jira/browse/SPARK-17337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15452419#comment-15452419
 ] 

Nattavut Sutyanyong commented on SPARK-17337:
---------------------------------------------

@hvanhovell has observed that by disabling the rule 
{{PushPredicateThroughJoin}} will solve this problem. The predicate is pushed 
down because the code thinks all the columns in the LeftAnti predicate 
reference to the T2, the right table of (T1 LOJ T2). The fact is some of the 
C2#77 in the LeftAnti predicate are from the T2 in the NOT IN subquery.

However, {{PushPredicateThroughJoin}} is not the root cause of the problem. The 
problem is the identifier {{[value#75 AS c2#77]}} is used in two places from 
the two unrelated references of {{SubqueryAlias t2}}. This can be demonstrated 
by replacing the T2 in the subquery by a different name, SQ in the example 
below. We will not see the LeftAnti predicate pushed down below the LOJ.

{noformat}
Seq(1).toDF("cx").createOrReplaceTempView("sq")

scala> sql("select * from (select t2.c2+1 as c3 from t1 left join t2 on 
t1.c1=t2.c2) t3 where c3 not in (select cx from sq)").explain(true)

...

== Optimized Logical Plan ==
Project [(c2#77 + 1) AS c3#137]
+- Join LeftAnti, (isnull(((c2#77 + 1) = cx#133)) || ((c2#77 + 1) = cx#133))
   :- Join LeftOuter, (c1#3 = c2#77)
   :  :- LocalRelation [c1#3]
   :  +- LocalRelation [c2#77]
   +- LocalRelation [cx#133]
...
{noformat}

> Incomplete algorithm for name resolution in Catalyst paser may lead to 
> incorrect result
> ---------------------------------------------------------------------------------------
>
>                 Key: SPARK-17337
>                 URL: https://issues.apache.org/jira/browse/SPARK-17337
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 2.0.0
>            Reporter: Nattavut Sutyanyong
>
> While investigating SPARK-16951, I found an incorrect results case from a NOT 
> IN subquery. I thought originally it is an edge case. Further investigation 
> found this is a more general problem.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to