[ https://issues.apache.org/jira/browse/SPARK-32361?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
ulysses you updated SPARK-32361: -------------------------------- Description: We can remove some redundant project after we completed column pruning. e.g., {code:java} create table t1(c1 int, c2 int) using parquet; explain extended select sum(c1) from ( select * from t1 ); {code} Currently we get this plan. {code:java} == Physical Plan == *(2) HashAggregate(keys=[], functions=[sum(cast(c1#19 as bigint))], output=[sum(c1)#68L]) +- Exchange SinglePartition, true, [id=#86] +- *(1) HashAggregate(keys=[], functions=[partial_sum(cast(c1#19 as bigint))], output=[sum#70L]) +- *(1) Project [c1#19] +- *(1) ColumnarToRow +- FileScan parquet default.t1[c1#19] Batched: true, DataFilters: [], Format: Parquet, Location: InMemoryFileIndex[hdfs:///user/hive/warehouse/t1], PartitionFilters: [], PushedFilters: [], ReadSchema: struct<c1:int> {code} We can remove the `Project`, like this {code:java} == Physical Plan == *(2) HashAggregate(keys=[], functions=[sum(cast(c1#19 as bigint))], output=[sum(c1)#68L]) +- Exchange SinglePartition, true, [id=#86] +- *(1) HashAggregate(keys=[], functions=[partial_sum(cast(c1#19 as bigint))], output=[sum#70L]) +- *(1) ColumnarToRow +- FileScan parquet default.t1[c1#19] Batched: true, DataFilters: [], Format: Parquet, Location: InMemoryFileIndex[hdfs:///user/hive/warehouse/t1], PartitionFilters: [], PushedFilters: [], ReadSchema: struct<c1:int> {code} > Remove project if output is subset of child > ------------------------------------------- > > Key: SPARK-32361 > URL: https://issues.apache.org/jira/browse/SPARK-32361 > Project: Spark > Issue Type: Improvement > Components: SQL > Affects Versions: 3.1.0 > Reporter: ulysses you > Priority: Minor > > We can remove some redundant project after we completed column pruning. > e.g., > {code:java} > create table t1(c1 int, c2 int) using parquet; > explain extended > select sum(c1) from ( > select * from t1 > ); > {code} > Currently we get this plan. > {code:java} > == Physical Plan == > *(2) HashAggregate(keys=[], functions=[sum(cast(c1#19 as bigint))], > output=[sum(c1)#68L]) > +- Exchange SinglePartition, true, [id=#86] > +- *(1) HashAggregate(keys=[], functions=[partial_sum(cast(c1#19 as > bigint))], output=[sum#70L]) > +- *(1) Project [c1#19] > +- *(1) ColumnarToRow > +- FileScan parquet default.t1[c1#19] Batched: true, DataFilters: > [], Format: Parquet, Location: > InMemoryFileIndex[hdfs:///user/hive/warehouse/t1], PartitionFilters: [], > PushedFilters: [], ReadSchema: struct<c1:int> > {code} > We can remove the `Project`, like this > {code:java} > == Physical Plan == > *(2) HashAggregate(keys=[], functions=[sum(cast(c1#19 as bigint))], > output=[sum(c1)#68L]) > +- Exchange SinglePartition, true, [id=#86] > +- *(1) HashAggregate(keys=[], functions=[partial_sum(cast(c1#19 as > bigint))], output=[sum#70L]) > +- *(1) ColumnarToRow > +- FileScan parquet default.t1[c1#19] Batched: true, DataFilters: > [], Format: Parquet, Location: > InMemoryFileIndex[hdfs:///user/hive/warehouse/t1], PartitionFilters: [], > PushedFilters: [], ReadSchema: struct<c1:int> > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org