[ https://issues.apache.org/jira/browse/SPARK-17653?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15524175#comment-15524175 ]
Xiao Li commented on SPARK-17653: --------------------------------- Since Simon already submitted the PR, I will not continue the investigation. Thanks for answering my original question. > Optimizer should remove unnecessary distincts (in multiple unions) > ------------------------------------------------------------------ > > Key: SPARK-17653 > URL: https://issues.apache.org/jira/browse/SPARK-17653 > Project: Spark > Issue Type: Improvement > Components: SQL > Affects Versions: 2.0.0 > Reporter: Reynold Xin > > Query: > {code} > select 1 a union select 2 b union select 3 c > {code} > Explain plan: > {code} > == Physical Plan == > *HashAggregate(keys=[a#13], functions=[]) > +- Exchange hashpartitioning(a#13, 200) > +- *HashAggregate(keys=[a#13], functions=[]) > +- Union > :- *HashAggregate(keys=[a#13], functions=[]) > : +- Exchange hashpartitioning(a#13, 200) > : +- *HashAggregate(keys=[a#13], functions=[]) > : +- Union > : :- *Project [1 AS a#13] > : : +- Scan OneRowRelation[] > : +- *Project [2 AS b#14] > : +- Scan OneRowRelation[] > +- *Project [3 AS c#15] > +- Scan OneRowRelation[] > {code} > Only one distinct should be necessary. This makes a bunch of unions slower > than a bunch of union alls followed by a distinct. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org