[ https://issues.apache.org/jira/browse/SPARK-32216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Apache Spark reassigned SPARK-32216: ------------------------------------ Assignee: (was: Apache Spark) > Remove redundant ProjectExec > ---------------------------- > > Key: SPARK-32216 > URL: https://issues.apache.org/jira/browse/SPARK-32216 > Project: Spark > Issue Type: Improvement > Components: SQL > Affects Versions: 3.0.0 > Reporter: Allison Wang > Priority: Major > > Currently Spark executed plan can have redundant `ProjectExec` node. For > example: > After Filter: > {code:java} > == Physical Plan == > *(1) Project [a#14L, b#15L, c#16, key#17] > +- *(1) Filter (isnotnull(a#14L) AND (a#14L > 5)) > +- *(1) ColumnarToRow > +- FileScan parquet [a#14L,b#15L,c#16,key#17] {code} > The `Project [a#14L, b#15L, c#16, key#17]` is redundant because its output is > exactly the same as filter's output. > Before Aggregate: > {code:java} > == Physical Plan == > *(2) HashAggregate(keys=[key#17], functions=[sum(a#14L), last(b#15L, false)], > output=[sum_a#39L, key#17, last_b#41L]) > +- Exchange hashpartitioning(key#17, 5), true, [id=#77] > +- *(1) HashAggregate(keys=[key#17], functions=[partial_sum(a#14L), > partial_last(b#15L, false)], output=[key#17, sum#49L, last#50L, valueSet#51]) > +- *(1) Project [key#17, a#14L, b#15L] > +- *(1) Filter (isnotnull(a#14L) AND (a#14L > 100)) > +- *(1) ColumnarToRow > +- FileScan parquet [a#14L,b#15L,key#17] {code} > The `Project [key#17, a#14L, b#15L]` is redundant because hash aggregate > doesn't require child plan's output to be in a specific order. > > In general, a project is redundant when > # It has the same output attributes and order as its child's output when > ordering of these attributes is required. > # It has the same output attributes as its child's output when attribute > output ordering is not required. > -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org