[ https://issues.apache.org/jira/browse/SPARK-40862?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Wenchen Fan reassigned SPARK-40862: ----------------------------------- Assignee: Allison Wang > Unexpected operators when rewriting scalar subqueries with non-deterministic > expressions > ---------------------------------------------------------------------------------------- > > Key: SPARK-40862 > URL: https://issues.apache.org/jira/browse/SPARK-40862 > Project: Spark > Issue Type: Sub-task > Components: SQL > Affects Versions: 3.4.0 > Reporter: Allison Wang > Assignee: Allison Wang > Priority: Major > > Since SPARK-28379, Spark has supported non-aggregated single-row correlated > subqueries. SPARK-40800 handles the majority of the cases where projects can > be collapsed. But Spark can throw exceptions for single-row subqueries with > non-deterministic expressions. For example: > {code:java} > CREATE TEMP VIEW t1 AS SELECT ARRAY('a', 'b') a > SELECT ( > SELECT array_sort(a, (i, j) -> rank[i] - rank[j])[0] + r + r AS sorted > FROM (SELECT MAP('a', 1, 'b', 2) rank, rand() as r) > ) FROM t1{code} > This throws an exception: > {code:java} > Unexpected operator Join Inner > :- Aggregate [[a,b]], [[a,b] AS a#253] > : +- OneRowRelation > +- Project [map(keys: [a,b], values: [1,2]) AS rank#241, > rand(86882494013664043) AS r#242] > +- OneRowRelation > in correlated subquery{code} > This is because when Spark rewrites correlated subqueries, it checks whether > a scalar subquery is subject to the COUNT bug. It splits the query into parts > above the aggregate, the aggregate, and the parts below the aggregate (see > `splitSubquery` in the `RewriteCorrelatedScalarSubquery` rule). > This pattern is very restrictive and does not work well with non-aggregated > single-row subqueries. We should fix this issue. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org