[ https://issues.apache.org/jira/browse/SPARK-36478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17397276#comment-17397276 ]
Apache Spark commented on SPARK-36478: -------------------------------------- User 'wankunde' has created a pull request for this issue: https://github.com/apache/spark/pull/33702 > Removes outer join if all grouping and aggregate expressions are from the > streamed side > --------------------------------------------------------------------------------------- > > Key: SPARK-36478 > URL: https://issues.apache.org/jira/browse/SPARK-36478 > Project: Spark > Issue Type: Improvement > Components: SQL > Affects Versions: 3.2.0 > Reporter: Wan Kun > Priority: Minor > > Removes outer join if all grouping and aggregate expressions are from the > streamed side. > For example: > {code:java} > spark.range(200L).selectExpr("id AS a").createTempView("t1") > spark.range(300L).selectExpr("id AS b").createTempView("t2") > spark.sql("SELECT DISTINCT a FROM t1 LEFT JOIN t2 ON a = > b").explain(true){code} > Current optimized plan: > {code:java} > == Optimized Logical Plan == > Aggregate [b#3L], [b#3L, max(c#4L) AS c#20L] > +- Project [b#3L, c#4L] > +- Join LeftOuter, (a#2L = a#10L) > :- Project [id#0L AS a#2L, id#0L AS b#3L, id#0L AS c#4L] > : +- Range (0, 200, step=1, splits=Some(1)) > +- Project [id#8L AS a#10L] > +- Range (0, 300, step=1, splits=Some(1)) > {code} > Expected optimized plan: > {code:java} > == Optimized Logical Plan == > Aggregate [b#277L], [b#277L, max(c#278L) AS c#290L] > +- Project [id#274L AS b#277L, id#274L AS c#278L] > +- Range (0, 200, step=1, splits=Some(2)) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org