[ 
https://issues.apache.org/jira/browse/SPARK-36747?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allison Wang updated SPARK-36747:
---------------------------------
    Description: 
Currently CollapseProject combines Project with Aggregate when the shared 
attributes are deterministic. But if there are correlated scalar subqueries in 
the project list that uses the output of the aggregate, they cannot be 
combined. Otherwise, the plan after rewrite will not be valid:

{code}
select (select sum(c2) from t where c1 = cast(s as int)) from (select sum(c2) s 
from t)

== Optimized Logical Plan ==
Aggregate [sum(c2)#10L AS scalarsubquery(s)#11L]
+- Project [sum(c2)#10L]
   +- Join LeftOuter, (c1#2 = cast(sum(c2#3) as int))
      :- LocalRelation [c2#3]
      +- Aggregate [c1#2], [sum(c2#3) AS sum(c2)#10L, c1#2]
         +- LocalRelation [c1#2, c2#3]

java.lang.UnsupportedOperationException: Cannot generate code for expression: 
sum(input[0, int, false])
{code}

  was:
Currently CollapseProject combines Project with Aggregate when the shared 
attributes are deterministic. But if there are correlated scalar subqueries in 
the project list that uses the output of the aggregate, they cannot be 
combined. Otherwise, the plan after rewrite will not be valid:

{code}
select (select sum(c2) from t where c1 = cast(s as int)) from (select sum(c2) s 
from t)

== Optimized Logical Plan ==
Aggregate [sum(b)#28L AS scalarsubquery(s)#29L]
+- Project [sum(b)#28L]
   +- Join LeftOuter, (a#20 = cast(sum(b#21) as int))
      :- LocalRelation [b#21]
      +- Aggregate [a#20], [sum(b#21) AS sum(b)#28L, a#20]
         +- LocalRelation [a#20, b#21]

java.lang.UnsupportedOperationException: Cannot generate code for expression: 
sum(input[0, int, false])
{code}


> Do not collapse Project with Aggregate when correlated subqueries are present 
> in the project list
> -------------------------------------------------------------------------------------------------
>
>                 Key: SPARK-36747
>                 URL: https://issues.apache.org/jira/browse/SPARK-36747
>             Project: Spark
>          Issue Type: Sub-task
>          Components: SQL
>    Affects Versions: 3.2.0
>            Reporter: Allison Wang
>            Priority: Major
>
> Currently CollapseProject combines Project with Aggregate when the shared 
> attributes are deterministic. But if there are correlated scalar subqueries 
> in the project list that uses the output of the aggregate, they cannot be 
> combined. Otherwise, the plan after rewrite will not be valid:
> {code}
> select (select sum(c2) from t where c1 = cast(s as int)) from (select sum(c2) 
> s from t)
> == Optimized Logical Plan ==
> Aggregate [sum(c2)#10L AS scalarsubquery(s)#11L]
> +- Project [sum(c2)#10L]
>    +- Join LeftOuter, (c1#2 = cast(sum(c2#3) as int))
>       :- LocalRelation [c2#3]
>       +- Aggregate [c1#2], [sum(c2#3) AS sum(c2)#10L, c1#2]
>          +- LocalRelation [c1#2, c2#3]
> java.lang.UnsupportedOperationException: Cannot generate code for expression: 
> sum(input[0, int, false])
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to