[ https://issues.apache.org/jira/browse/SPARK-39893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Wan Kun updated SPARK-39893: ---------------------------- Description: If all group expressions are foldable, the result of this aggregate will always be OneRowRelation. And if all aggregate expressions are foldable, we can add limit 1 to the child plan. For example: Table : {code:sql} create table tab(key int, value string) using parquet {code} Query : {code:sql} SELECT distinct 1001 as id , cast('2022-06-03' as date) AS DT FROM testData {code} can change to {code:sql} SELECT distinct 1001 as id , cast('2022-06-03' as date) AS DT FROM ( SELECT * FROM tab LIMIT 1 ) {code} Query: {code:sql} SELECT 1001 AS id, cast('2022-06-03' as date) AS DT FROM tab group by 'x' {code} is equal to {code:sql} SELECT 1001 AS id, cast('2022-06-03' as date) AS DT FROM ( SELECT * FROM tab LIMIT 1 ) group by 'x' {code} was: If all groupingExpressions and aggregateExpressions in a aggregate are foldable, we can remove this aggregate. For example, query : {code:java} SELECT distinct 1001 as id , cast('2022-06-03' as date) AS DT FROM testData {code} the grouping expressions are : *[1001, 2022-06-03]* the aggregate expressions are : *[1001 AS id#274, 2022-06-03 AS DT#275]* so we can skip scan table testData and remote the aggregate operation. Before this PR: {code:java} Aggregate [1001, 2022-06-03], [1001 AS id#274, 2022-06-03 AS DT#275], Statistics(sizeInBytes=16.0 EiB) +- SerializeFromObject, Statistics(sizeInBytes=8.0 EiB) +- ExternalRDD [obj#12], Statistics(sizeInBytes=8.0 EiB) {code} After this PR: {code:java} Project [1001 AS id#218, 2022-06-03 AS DT#219], Statistics(sizeInBytes=2.0 B) +- OneRowRelation, Statistics(sizeInBytes=1.0 B) {code} > Push limit 1 to the aggregate's child plan if grouping expressions and > aggregate expressions are foldable > --------------------------------------------------------------------------------------------------------- > > Key: SPARK-39893 > URL: https://issues.apache.org/jira/browse/SPARK-39893 > Project: Spark > Issue Type: Improvement > Components: SQL > Affects Versions: 3.4.0 > Reporter: Wan Kun > Priority: Major > > If all group expressions are foldable, the result of this aggregate will > always be OneRowRelation. > And if all aggregate expressions are foldable, we can add limit 1 to the > child plan. > For example: > Table : > {code:sql} > create table tab(key int, value string) using parquet > {code} > Query : > {code:sql} > SELECT distinct 1001 as id , cast('2022-06-03' as date) AS DT > FROM testData > {code} > can change to > {code:sql} > SELECT distinct 1001 as id , cast('2022-06-03' as date) AS DT > FROM ( > SELECT * > FROM tab > LIMIT 1 > ) > {code} > Query: > {code:sql} > SELECT 1001 AS id, cast('2022-06-03' as date) AS DT > FROM tab > group by 'x' > {code} > is equal to > {code:sql} > SELECT 1001 AS id, cast('2022-06-03' as date) AS DT > FROM ( > SELECT * > FROM tab > LIMIT 1 > ) > group by 'x' > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org