[ 
https://issues.apache.org/jira/browse/SPARK-36339?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-36339.
---------------------------------
    Fix Version/s: 3.2.0
       Resolution: Fixed

Issue resolved by pull request 33574
[https://github.com/apache/spark/pull/33574]

> aggsBuffer should collect AggregateExpression in the map range
> --------------------------------------------------------------
>
>                 Key: SPARK-36339
>                 URL: https://issues.apache.org/jira/browse/SPARK-36339
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 2.4.8, 3.0.3, 3.1.2
>            Reporter: gaoyajun02
>            Priority: Major
>              Labels: grouping
>             Fix For: 3.2.0
>
>
> show demo for this ISSUE:
> {code:java}
> // SQL without error
> SELECT name, count(name) c
> FROM VALUES ('Alice'), ('Bob') people(name)
> GROUP BY name GROUPING SETS(name);
> // An error is reported after exchanging the order of the query columns:
> SELECT count(name) c, name
> FROM VALUES ('Alice'), ('Bob') people(name)
> GROUP BY name GROUPING SETS(name);
> {code}
> The error message is:
> {code:java}
> Error in query: expression 'people.`name`' is neither present in the group 
> by, nor is it an aggregate function. Add to group by or wrap in first() (or 
> first_value) if you don't care which value you get.;;
> Aggregate [name#5, spark_grouping_id#3], [count(name#1) AS c#0L, name#1]
> +- Expand [List(name#1, name#4, 0)], [name#1, name#5, spark_grouping_id#3]
>    +- Project [name#1, name#1 AS name#4]
>       +- SubqueryAlias `people`
>          +- LocalRelation [name#1]
> {code}
> So far, I have checked that there is no problem before version 2.3.
>  
> During debugging, I found that the behavior of constructAggregateExprs in 
> ResolveGroupingAnalytics has changed.
> {code:java}
>     /*
>      * Construct new aggregate expressions by replacing grouping functions.
>      */
>     private def constructAggregateExprs(
>         groupByExprs: Seq[Expression],
>         aggregations: Seq[NamedExpression],
>         groupByAliases: Seq[Alias],
>         groupingAttrs: Seq[Expression],
>         gid: Attribute): Seq[NamedExpression] = aggregations.map {
>       // collect all the found AggregateExpression, so we can check an 
> expression is part of
>       // any AggregateExpression or not.
>       val aggsBuffer = ArrayBuffer[Expression]()
>       // Returns whether the expression belongs to any expressions in 
> `aggsBuffer` or not.
>       def isPartOfAggregation(e: Expression): Boolean = {
>         aggsBuffer.exists(a => a.find(_ eq e).isDefined)
>       }
>       replaceGroupingFunc(_, groupByExprs, gid).transformDown {
>         // AggregateExpression should be computed on the unmodified value of 
> its argument
>         // expressions, so we should not replace any references to grouping 
> expression
>         // inside it.
>         case e: AggregateExpression =>
>           aggsBuffer += e
>           e
>         case e if isPartOfAggregation(e) => e
>         case e =>
>           // Replace expression by expand output attribute.
>           val index = groupByAliases.indexWhere(_.child.semanticEquals(e))
>           if (index == -1) {
>             e
>           } else {
>             groupingAttrs(index)
>           }
>       }.asInstanceOf[NamedExpression]
>     }
> {code}
> When performing aggregations.map, the aggsBuffer here seems to be outside the 
> scope of the map. It can store the AggregateExpression of all the elements 
> processed by the map function, but this is not before 2.3.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to