[ 
https://issues.apache.org/jira/browse/SPARK-19712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16715460#comment-16715460
 ] 

ASF GitHub Bot commented on SPARK-19712:
----------------------------------------

dilipbiswal commented on a change in pull request #23211: [SPARK-19712][SQL] 
Move PullupCorrelatedPredicates and RewritePredicateSubquery after 
OptimizeSubqueries
URL: https://github.com/apache/spark/pull/23211#discussion_r240352498
 
 

 ##########
 File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
 ##########
 @@ -649,13 +664,16 @@ object CollapseProject extends Rule[LogicalPlan] {
 
   def apply(plan: LogicalPlan): LogicalPlan = plan transformUp {
     case p1 @ Project(_, p2: Project) =>
-      if (haveCommonNonDeterministicOutput(p1.projectList, p2.projectList)) {
+      if (haveCommonNonDeterministicOutput(p1.projectList, p2.projectList) ||
+        ScalarSubquery.hasScalarSubquery(p1.projectList) ||
+        ScalarSubquery.hasScalarSubquery(p2.projectList)) {
 
 Review comment:
   @cloud-fan.. Let me get back to you on this, need to debug again :-)

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


> EXISTS and Left Semi join do not produce the same plan
> ------------------------------------------------------
>
>                 Key: SPARK-19712
>                 URL: https://issues.apache.org/jira/browse/SPARK-19712
>             Project: Spark
>          Issue Type: Sub-task
>          Components: SQL
>    Affects Versions: 2.1.0
>            Reporter: Nattavut Sutyanyong
>            Priority: Major
>
> This problem was found during the development of SPARK-18874.
> The EXISTS form in the following query:
> {{sql("select * from t1 inner join t2 on t1.t1a=t2.t2a where exists (select 1 
> from t3 where t1.t1b=t3.t3b)")}}
> gives the optimized plan below:
> {code}
> == Optimized Logical Plan ==
> Join Inner, (t1a#7 = t2a#25)
> :- Join LeftSemi, (t1b#8 = t3b#58)
> :  :- Filter isnotnull(t1a#7)
> :  :  +- Relation[t1a#7,t1b#8,t1c#9] parquet
> :  +- Project [1 AS 1#271, t3b#58]
> :     +- Relation[t3a#57,t3b#58,t3c#59] parquet
> +- Filter isnotnull(t2a#25)
>    +- Relation[t2a#25,t2b#26,t2c#27] parquet
> {code}
> whereas a semantically equivalent Left Semi join query below:
> {{sql("select * from t1 inner join t2 on t1.t1a=t2.t2a left semi join t3 on 
> t1.t1b=t3.t3b")}}
> gives the following optimized plan:
> {code}
> == Optimized Logical Plan ==
> Join LeftSemi, (t1b#8 = t3b#58)
> :- Join Inner, (t1a#7 = t2a#25)
> :  :- Filter (isnotnull(t1b#8) && isnotnull(t1a#7))
> :  :  +- Relation[t1a#7,t1b#8,t1c#9] parquet
> :  +- Filter isnotnull(t2a#25)
> :     +- Relation[t2a#25,t2b#26,t2c#27] parquet
> +- Project [t3b#58]
>    +- Relation[t3a#57,t3b#58,t3c#59] parquet
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to