Jack Chen created SPARK-45009:
---------------------------------

             Summary: Correlated EXISTS subqueries in join ON condition 
unsupported and fail with internal error
                 Key: SPARK-45009
                 URL: https://issues.apache.org/jira/browse/SPARK-45009
             Project: Spark
          Issue Type: Sub-task
          Components: SQL
    Affects Versions: 3.4.0
            Reporter: Jack Chen


They are not handled in decorrelation, and we also don’t have any checks to 
block them, so these queries have outer references in the query plan leading to 
internal errors:
{code:java}
CREATE TEMP VIEW x(x1, x2) AS VALUES (0, 1), (1, 2);
CREATE TEMP VIEW y(y1, y2) AS VALUES (0, 2), (0, 3);
CREATE TEMP VIEW z(z1, z2) AS VALUES (0, 2), (0, 3);
select * from x left join y on x1 = y1 and exists (select * from z where z1 = 
x1)

Error occurred during query planning: 
org.apache.spark.sql.catalyst.plans.logical.Filter cannot be cast to 
org.apache.spark.sql.execution.SparkPlan {code}
PullupCorrelatedPredicates#apply and RewritePredicateSubquery only handle 
subqueries in UnaryNode, it seems to assume that they cannot occur elsewhere, 
like in a join ON condition.

We will need to decide whether to block them properly in analysis (i.e. give a 
proper error for them), or see if we can add support for them.

Also note, scalar subqueries in the ON condition are unsupported too but return 
a proper error.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to