[ 
https://issues.apache.org/jira/browse/SPARK-17348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15453285#comment-15453285
 ] 

Nattavut Sutyanyong commented on SPARK-17348:
---------------------------------------------

The root cause is in the Analysis phase where the correlated predicate {{T1.C2 
>= T2.C2}} is pulled up above the Aggregation operator.

 {noformat}
== Parsed Logical Plan ==
'Project ['c1]
+- 'Filter 'c1 IN (list#324)
   :  +- 'Project [unresolvedalias('max('t2.c1), None)]
   :     +- 'Filter ('t1.c2 >= 't2.c2)
   :        +- 'UnresolvedRelation `t2`
   +- 'UnresolvedRelation `t1`

== Analyzed Logical Plan ==
c1: int
Project [c1#334]
+- Filter predicate-subquery#324 [(c1#334 = max(c1)#339) && (c2#335 >= c2#337)]
   :  +- Aggregate [c2#337], [max(c1#336) AS max(c1)#339, c2#337]
   :     +- SubqueryAlias t2
   :        +- Project [_1#299 AS c1#336, _2#300 AS c2#337]
   :           +- LocalRelation [_1#299, _2#300]
   +- SubqueryAlias t1
      +- Project [_1#288 AS c1#334, _2#289 AS c2#335]
         +- LocalRelation [_1#288, _2#289]
{noformat}

The pull up is okay only when the comparison is an equality.

> Incorrect results from subquery transformation
> ----------------------------------------------
>
>                 Key: SPARK-17348
>                 URL: https://issues.apache.org/jira/browse/SPARK-17348
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 2.0.0
>            Reporter: Nattavut Sutyanyong
>
> {noformat}
> Seq((1,1)).toDF("c1","c2").createOrReplaceTempView("t1")
> Seq((1,1),(2,0)).toDF("c1","c2").createOrReplaceTempView("t2")
> sql("select c1 from t1 where c1 in (select max(t2.c1) from t2 where t1.c2 >= 
> t2.c2)").show
> +---+
> | c1|
> +---+
> |  1|
> +---+
> {noformat}
> The correct result of the above query should be an empty set. Here is an 
> explanation:
> Both rows from T2 satisfies the correlated predicate T1.C2 >= T2.C2 when 
> T1.C1 = 1 so both rows needs to be processed in the same group of the 
> aggregation process in the subquery. The result of the aggregation yields 
> MAX(T2.C1) as 2. Therefore, the result of the evaluation of the predicate 
> T1.C1 (which is 1) IN MAX(T2.C1) (which is 2) should be an empty set.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to