Jack Chen created SPARK-45009:
---------------------------------
Summary: Correlated EXISTS subqueries in join ON condition
unsupported and fail with internal error
Key: SPARK-45009
URL: https://issues.apache.org/jira/browse/SPARK-45009
Project: Spark
Issue Type: Sub-task
Components: SQL
Affects Versions: 3.4.0
Reporter: Jack Chen
They are not handled in decorrelation, and we also don’t have any checks to
block them, so these queries have outer references in the query plan leading to
internal errors:
{code:java}
CREATE TEMP VIEW x(x1, x2) AS VALUES (0, 1), (1, 2);
CREATE TEMP VIEW y(y1, y2) AS VALUES (0, 2), (0, 3);
CREATE TEMP VIEW z(z1, z2) AS VALUES (0, 2), (0, 3);
select * from x left join y on x1 = y1 and exists (select * from z where z1 =
x1)
Error occurred during query planning:
org.apache.spark.sql.catalyst.plans.logical.Filter cannot be cast to
org.apache.spark.sql.execution.SparkPlan {code}
PullupCorrelatedPredicates#apply and RewritePredicateSubquery only handle
subqueries in UnaryNode, it seems to assume that they cannot occur elsewhere,
like in a join ON condition.
We will need to decide whether to block them properly in analysis (i.e. give a
proper error for them), or see if we can add support for them.
Also note, scalar subqueries in the ON condition are unsupported too but return
a proper error.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]