[
https://issues.apache.org/jira/browse/SPARK-17348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15453285#comment-15453285
]
Nattavut Sutyanyong edited comment on SPARK-17348 at 8/31/16 8:37 PM:
----------------------------------------------------------------------
The root cause is in the rule {{ResolveSubquery}} in Analysis phase where the
correlated predicate {{T1.C2 >= T2.C2}} is pulled up above the Aggregation
operator.
{noformat}
== Parsed Logical Plan ==
'Project ['c1]
+- 'Filter 'c1 IN (list#324)
: +- 'Project [unresolvedalias('max('t2.c1), None)]
: +- 'Filter ('t1.c2 >= 't2.c2)
: +- 'UnresolvedRelation `t2`
+- 'UnresolvedRelation `t1`
== Analyzed Logical Plan ==
c1: int
Project [c1#334]
+- Filter predicate-subquery#324 [(c1#334 = max(c1)#339) && (c2#335 >= c2#337)]
: +- Aggregate [c2#337], [max(c1#336) AS max(c1)#339, c2#337]
: +- SubqueryAlias t2
: +- Project [_1#299 AS c1#336, _2#300 AS c2#337]
: +- LocalRelation [_1#299, _2#300]
+- SubqueryAlias t1
+- Project [_1#288 AS c1#334, _2#289 AS c2#335]
+- LocalRelation [_1#288, _2#289]
{noformat}
The pull up is okay only when the comparison is an equality.
was (Author: nsyca):
The root cause is in the Analysis phase where the correlated predicate {{T1.C2
>= T2.C2}} is pulled up above the Aggregation operator.
{noformat}
== Parsed Logical Plan ==
'Project ['c1]
+- 'Filter 'c1 IN (list#324)
: +- 'Project [unresolvedalias('max('t2.c1), None)]
: +- 'Filter ('t1.c2 >= 't2.c2)
: +- 'UnresolvedRelation `t2`
+- 'UnresolvedRelation `t1`
== Analyzed Logical Plan ==
c1: int
Project [c1#334]
+- Filter predicate-subquery#324 [(c1#334 = max(c1)#339) && (c2#335 >= c2#337)]
: +- Aggregate [c2#337], [max(c1#336) AS max(c1)#339, c2#337]
: +- SubqueryAlias t2
: +- Project [_1#299 AS c1#336, _2#300 AS c2#337]
: +- LocalRelation [_1#299, _2#300]
+- SubqueryAlias t1
+- Project [_1#288 AS c1#334, _2#289 AS c2#335]
+- LocalRelation [_1#288, _2#289]
{noformat}
The pull up is okay only when the comparison is an equality.
> Incorrect results from subquery transformation
> ----------------------------------------------
>
> Key: SPARK-17348
> URL: https://issues.apache.org/jira/browse/SPARK-17348
> Project: Spark
> Issue Type: Bug
> Components: SQL
> Affects Versions: 2.0.0
> Reporter: Nattavut Sutyanyong
>
> {noformat}
> Seq((1,1)).toDF("c1","c2").createOrReplaceTempView("t1")
> Seq((1,1),(2,0)).toDF("c1","c2").createOrReplaceTempView("t2")
> sql("select c1 from t1 where c1 in (select max(t2.c1) from t2 where t1.c2 >=
> t2.c2)").show
> +---+
> | c1|
> +---+
> | 1|
> +---+
> {noformat}
> The correct result of the above query should be an empty set. Here is an
> explanation:
> Both rows from T2 satisfies the correlated predicate T1.C2 >= T2.C2 when
> T1.C1 = 1 so both rows needs to be processed in the same group of the
> aggregation process in the subquery. The result of the aggregation yields
> MAX(T2.C1) as 2. Therefore, the result of the evaluation of the predicate
> T1.C1 (which is 1) IN MAX(T2.C1) (which is 2) should be an empty set.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]