[ 
https://issues.apache.org/jira/browse/SPARK-43838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-43838.
---------------------------------
    Fix Version/s: 4.0.0
       Resolution: Fixed

Issue resolved by pull request 41347
[https://github.com/apache/spark/pull/41347]

> Subquery on single table with having clause can't be optimized
> --------------------------------------------------------------
>
>                 Key: SPARK-43838
>                 URL: https://issues.apache.org/jira/browse/SPARK-43838
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 3.4.0
>            Reporter: Jia Fan
>            Assignee: Jia Fan
>            Priority: Major
>             Fix For: 4.0.0
>
>
> Eg:
> {code:java}
> sql("create view t(c1, c2) as values (0, 1), (0, 2), (1, 2)")
> sql("select c1, c2, (select count(*) cnt from t t2 where t1.c1 = t2.c1 " +
> "having cnt = 0) from t t1").show() {code}
> The error will throw:
> {code:java}
> [PLAN_VALIDATION_FAILED_RULE_IN_BATCH] Rule 
> org.apache.spark.sql.catalyst.optimizer.RewriteCorrelatedScalarSubquery in 
> batch Operator Optimization before Inferring Filters generated an invalid 
> plan: The plan becomes unresolved: 'Project [toprettystring(c1#224, 
> Some(America/Los_Angeles)) AS toprettystring(c1)#238, toprettystring(c2#225, 
> Some(America/Los_Angeles)) AS toprettystring(c2)#239, 
> toprettystring(cnt#246L, Some(America/Los_Angeles)) AS 
> toprettystring(scalarsubquery(c1))#240]
> +- 'Project [c1#224, c2#225, CASE WHEN isnull(alwaysTrue#245) THEN 0 WHEN NOT 
> (cnt#222L = 0) THEN null ELSE cnt#222L END AS cnt#246L]
>    +- 'Join LeftOuter, (c1#224 = c1#224#244)
>       :- Project [col1#226 AS c1#224, col2#227 AS c2#225]
>       :  +- LocalRelation [col1#226, col2#227]
>       +- Project [cnt#222L, c1#224#244, cnt#222L, c1#224, true AS 
> alwaysTrue#245]
>          +- Project [cnt#222L, c1#224 AS c1#224#244, cnt#222L, c1#224]
>             +- Aggregate [c1#224], [count(1) AS cnt#222L, c1#224]
>                +- Project [col1#228 AS c1#224]
>                   +- LocalRelation [col1#228, col2#229]The previous plan: 
> Project [toprettystring(c1#224, Some(America/Los_Angeles)) AS 
> toprettystring(c1)#238, toprettystring(c2#225, Some(America/Los_Angeles)) AS 
> toprettystring(c2)#239, toprettystring(scalar-subquery#223 [c1#224 && (c1#224 
> = c1#224#244)], Some(America/Los_Angeles)) AS 
> toprettystring(scalarsubquery(c1))#240]
> :  +- Project [cnt#222L, c1#224 AS c1#224#244]
> :     +- Filter (cnt#222L = 0)
> :        +- Aggregate [c1#224], [count(1) AS cnt#222L, c1#224]
> :           +- Project [col1#228 AS c1#224]
> :              +- LocalRelation [col1#228, col2#229]
> +- Project [col1#226 AS c1#224, col2#227 AS c2#225]
>    +- LocalRelation [col1#226, col2#227] {code}
>  
> The reason are when execute subquery decorrelation, the fields in the 
> subquery but not in having clause are wrongly pull up. This problem only 
> occurs when there contain having clause.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to