Jack Chen created SPARK-43413: --------------------------------- Summary: IN subquery ListQuery has wrong nullability Key: SPARK-43413 URL: https://issues.apache.org/jira/browse/SPARK-43413 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 3.4.0 Reporter: Jack Chen
IN subquery expressions currently are marked as nullable if and only if the left-hand-side is nullable - because the right-hand-side of a IN subquery, the ListQuery, is currently defined with nullability = false always. This is incorrect and can lead to incorrect query transformations. Example: (non_nullable_col IN (select nullable_col)) <=> TRUE . Here the IN expression returns NULL when the nullable_col is null, but our code marks it as non-nullable, and therefore SimplifyBinaryComparison transforms away the <=> TRUE, transforming the expression to non_nullable_col IN (select nullable_col) , which is an incorrect transformation because NULL values of nullable_col now cause the expression to yield NULL instead of FALSE. This is a long-standing bug that has existed at least since 2016, as long as the ListQuery class has existed. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org