Jack Chen created SPARK-43413:
---------------------------------

             Summary: IN subquery ListQuery has wrong nullability
                 Key: SPARK-43413
                 URL: https://issues.apache.org/jira/browse/SPARK-43413
             Project: Spark
          Issue Type: Bug
          Components: SQL
    Affects Versions: 3.4.0
            Reporter: Jack Chen


IN subquery expressions currently are marked as nullable if and only if the 
left-hand-side is nullable - because the right-hand-side of a IN subquery, the 
ListQuery, is currently defined with nullability = false always. This is 
incorrect and can lead to incorrect query transformations.

Example: (non_nullable_col IN (select nullable_col)) <=> TRUE . Here the IN 
expression returns NULL when the nullable_col is null, but our code marks it as 
non-nullable, and therefore SimplifyBinaryComparison transforms away the <=> 
TRUE, transforming the expression to non_nullable_col IN (select nullable_col) 
, which is an incorrect transformation because NULL values of nullable_col now 
cause the expression to yield NULL instead of FALSE.

This is a long-standing bug that has existed at least since 2016, as long as 
the ListQuery class has existed.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to