[
https://issues.apache.org/jira/browse/SPARK-44550?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jack Chen updated SPARK-44550:
------------------------------
Fix Version/s: (was: 3.5.0)
> Wrong semantics for null IN (empty list)
> ----------------------------------------
>
> Key: SPARK-44550
> URL: https://issues.apache.org/jira/browse/SPARK-44550
> Project: Spark
> Issue Type: Bug
> Components: SQL
> Affects Versions: 3.4.0
> Reporter: Jack Chen
> Assignee: Jack Chen
> Priority: Major
>
> {{null IN (empty list)}} incorrectly evaluates to null, when it should
> evaluate to false. (The reason it should be false is because a IN (b1, b2) is
> defined as a = b1 OR a = b2, and an empty IN list is treated as an empty OR
> which is false. This is specified by ANSI SQL.)
> Many places in Spark execution (In, InSet, InSubquery) and optimization
> (OptimizeIn, NullPropagation) implemented this wrong behavior. Also note that
> the Spark behavior for the null IN (empty list) is inconsistent in some
> places - literal IN lists generally return null (incorrect), while IN/NOT IN
> subqueries mostly return false/true, respectively (correct) in this case.
> This is a longstanding correctness issue which has existed since null support
> for IN expressions was first added to Spark.
> Doc with more details:
> [https://docs.google.com/document/d/1k8AY8oyT-GI04SnP7eXttPDnDj-Ek-c3luF2zL6DPNU/edit]
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]