[
https://issues.apache.org/jira/browse/SPARK-34946?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Allison Wang updated SPARK-34946:
---------------------------------
Description:
Currently, Spark supports Aggregate to host correlated scalar subqueries, but
in some cases, those subqueries cannot be rewritten properly in the
`RewriteCorrelatedScalarSubquery` rule. Hence we should block these cases in
CheckAnalysis.
Case 1: correlated scalar subquery in the grouping expressions but not in
aggregate expressions
{code:java}
SELECT SUM(c2) FROM t t1 GROUP BY (SELECT SUM(c2) FROM t t2 WHERE t1.c1 = t2.c1)
{code}
We get this error:
{code:java}
java.lang.AssertionError: assertion failed: Expects 1 field, but got 2;
something went wrong in analysis
{code}
because the correlated scalar subquery is not rewritten properly:
{code:java}
== Optimized Logical Plan ==
Aggregate [scalar-subquery#5 [(c1#6 = c1#6#93)]], [sum(c2#7) AS sum(c2)#11L]
: +- Aggregate [c1#6], [sum(c2#7) AS sum(c2)#15L, c1#6 AS c1#6#93]
: +- LocalRelation [c1#6, c2#7]
+- LocalRelation [c1#6, c2#7]
{code}
Case 2: correlated scalar subquery in the aggregate expressions but not in the
grouping expressions
{code:java}
SELECT (SELECT SUM(c2) FROM t t2 WHERE t1.c1 = t2.c1), SUM(c2) FROM t t1 GROUP
BY c1
{code}
We get this error:
{code:java}
java.lang.IllegalStateException: Couldn't find sum(c2)#69L in
[c1#60,sum(c2#61)#64L]
{code}
because the transformed correlated scalar subquery output is not present in the
grouping expression of the Aggregate:
{code:java}
== Optimized Logical Plan ==
Aggregate [c1#60], [sum(c2)#69L AS scalarsubquery(c1)#70L, sum(c2#61) AS
sum(c2)#65L]
+- Project [c1#60, c2#61, sum(c2)#69L]
+- Join LeftOuter, (c1#60 = c1#60#95)
:- LocalRelation [c1#60, c2#61]
+- Aggregate [c1#60], [sum(c2#61) AS sum(c2)#69L, c1#60 AS c1#60#95]
+- LocalRelation [c1#60, c2#61]
{code}
was:
Currently, Spark supports Aggregate to host correlated scalar subqueries, but
in some cases, those subqueries cannot be rewritten properly in the
`RewriteCorrelatedScalarSubquery` rule. Hence we should block these cases in
CheckAnalysis.
Case 1: correlated scalar subquery in the grouping expressions but not in
aggregate expressions
{code:java}
SELECT SUM(c2) FROM t t1 GROUP BY (SELECT SUM(c2) FROM t t2 WHERE t1.c1 = t2.c1)
{code}
We get this error:
{code:java}
java.lang.AssertionError: assertion failed: Expects 1 field, but got 2;
something went wrong in analysis
{code}
because the correlated scalar subquery is not rewritten properly:
{code:java}
== Optimized Logical Plan ==
Aggregate [scalar-subquery#5 [(c1#6 = c1#6#93)]], [sum(c2#7) AS sum(c2)#11L]
: +- Aggregate [c1#6], [sum(c2#7) AS sum(c2)#15L, c1#6 AS c1#6#93]
: +- LocalRelation [c1#6, c2#7]
+- LocalRelation [c1#6, c2#7]
{code}
Case 2: correlated scalar subquery in the aggregate expressions but not in the
grouping expressions
{code:java}
SELECT (SELECT SUM(c2) FROM t t2 WHERE t1.c1 = t2.c1), SUM(c2) FROM t t1 GROUP
BY c1
{code}
We get this error
{code:java}
java.lang.IllegalStateException: Couldn't find sum(c2)#69L in
[c1#60,sum(c2#61)#64L]
{code}
because the transformed correlated scalar subquery output is not present in the
grouping expression of the Aggregate:
{code:java}
== Optimized Logical Plan ==
Aggregate [c1#60], [sum(c2)#69L AS scalarsubquery(c1)#70L, sum(c2#61) AS
sum(c2)#65L]
+- Project [c1#60, c2#61, sum(c2)#69L]
+- Join LeftOuter, (c1#60 = c1#60#95)
:- LocalRelation [c1#60, c2#61]
+- Aggregate [c1#60], [sum(c2#61) AS sum(c2)#69L, c1#60 AS c1#60#95]
+- LocalRelation [c1#60, c2#61]
{code}
> Block unsupported correlated scalar subquery in Aggregate
> ---------------------------------------------------------
>
> Key: SPARK-34946
> URL: https://issues.apache.org/jira/browse/SPARK-34946
> Project: Spark
> Issue Type: Improvement
> Components: SQL
> Affects Versions: 3.2.0
> Reporter: Allison Wang
> Priority: Major
>
> Currently, Spark supports Aggregate to host correlated scalar subqueries, but
> in some cases, those subqueries cannot be rewritten properly in the
> `RewriteCorrelatedScalarSubquery` rule. Hence we should block these cases in
> CheckAnalysis.
>
> Case 1: correlated scalar subquery in the grouping expressions but not in
> aggregate expressions
> {code:java}
> SELECT SUM(c2) FROM t t1 GROUP BY (SELECT SUM(c2) FROM t t2 WHERE t1.c1 =
> t2.c1)
> {code}
> We get this error:
> {code:java}
> java.lang.AssertionError: assertion failed: Expects 1 field, but got 2;
> something went wrong in analysis
> {code}
> because the correlated scalar subquery is not rewritten properly:
> {code:java}
> == Optimized Logical Plan ==
> Aggregate [scalar-subquery#5 [(c1#6 = c1#6#93)]], [sum(c2#7) AS sum(c2)#11L]
> : +- Aggregate [c1#6], [sum(c2#7) AS sum(c2)#15L, c1#6 AS c1#6#93]
> : +- LocalRelation [c1#6, c2#7]
> +- LocalRelation [c1#6, c2#7]
> {code}
>
> Case 2: correlated scalar subquery in the aggregate expressions but not in
> the grouping expressions
> {code:java}
> SELECT (SELECT SUM(c2) FROM t t2 WHERE t1.c1 = t2.c1), SUM(c2) FROM t t1
> GROUP BY c1
> {code}
> We get this error:
> {code:java}
> java.lang.IllegalStateException: Couldn't find sum(c2)#69L in
> [c1#60,sum(c2#61)#64L]
> {code}
> because the transformed correlated scalar subquery output is not present in
> the grouping expression of the Aggregate:
> {code:java}
> == Optimized Logical Plan ==
> Aggregate [c1#60], [sum(c2)#69L AS scalarsubquery(c1)#70L, sum(c2#61) AS
> sum(c2)#65L]
> +- Project [c1#60, c2#61, sum(c2)#69L]
> +- Join LeftOuter, (c1#60 = c1#60#95)
> :- LocalRelation [c1#60, c2#61]
> +- Aggregate [c1#60], [sum(c2#61) AS sum(c2)#69L, c1#60 AS c1#60#95]
> +- LocalRelation [c1#60, c2#61]
> {code}
>
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]