[
https://issues.apache.org/jira/browse/SPARK-41141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17635989#comment-17635989
]
Asif commented on SPARK-41141:
------------------------------
Opened the following PR
[SPARK-41141-PR|https://github.com/apache/spark/pull/38714/files]
> avoid introducing a new aggregate expression in the analysis phase when
> subquery is referencing it
> --------------------------------------------------------------------------------------------------
>
> Key: SPARK-41141
> URL: https://issues.apache.org/jira/browse/SPARK-41141
> Project: Spark
> Issue Type: Improvement
> Components: SQL
> Affects Versions: 3.3.1
> Reporter: Asif
> Priority: Minor
> Labels: spark-sql
>
> Currently the analyzer phase rules on subquery referencing the aggregate
> expression in outer query, avoids introducing a new aggregate only for a
> single level aggregate function. It introduces new aggregate expression for
> nested aggregate functions.
> It is possible to avoid adding this extra aggregate expression easily,
> atleast if the outer projection involving aggregate function is exactly same
> as the one that is used in subquery, or if the outer query's projection
> involving aggregate function is a subtree of the subquery's expression.
>
> Thus consider the following 2 cases:
> 1) select cos (sum(a)) , b from t1 group by b having exists (select x from
> t2 where y = cos(sum(a)) )
> 2) select sum(a) , b from t1 group by b having exists (select x from t2
> where y = cos(sum(a)) )
>
> In both the above cases, there is no need for adding extra aggregate
> expression.
>
> I am also investigating if its possible to avoid if the case is
>
> 3) select Cos(sum(a)) , b from t1 group by b having exists (select x from
> t2 where y = sum(a) )
>
> This Jira also is needed for another issue where subquery datasource v2 is
> projecting columns which are not needed. ( no Jira filed yet for that, will
> do that..)
>
> Will be opening a PR for this soon..
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]