[
https://issues.apache.org/jira/browse/SPARK-43760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Wenchen Fan resolved SPARK-43760.
---------------------------------
Fix Version/s: 3.5.0
Resolution: Fixed
Issue resolved by pull request 41287
[https://github.com/apache/spark/pull/41287]
> Incorrect attribute nullability after RewriteCorrelatedScalarSubquery leads
> to incorrect query results
> ------------------------------------------------------------------------------------------------------
>
> Key: SPARK-43760
> URL: https://issues.apache.org/jira/browse/SPARK-43760
> Project: Spark
> Issue Type: Bug
> Components: SQL
> Affects Versions: 3.4.0
> Reporter: Andrey Gubichev
> Priority: Major
> Fix For: 3.5.0
>
>
> The following query:
>
> {code:java}
> select * from (
> select t1.id c1, (
> select t2.id c from range (1, 2) t2
> where t1.id = t2.id ) c2
> from range (1, 3) t1 ) t
> where t.c2 is not null
> -- !query schema
> struct<c1:bigint,c2:bigint>
> -- !query output
> 1 1
> 2 NULL
> {code}
>
> should return 1 row, because the second row is supposed to be removed by
> IsNotNull predicate. However, due to a wrong nullability propagation after
> subquery decorrelation, the output of the subquery is declared as
> not-nullable (incorrectly), so the predicate is constant folded into True.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]