neilconway opened a new pull request, #22444: URL: https://github.com/apache/datafusion/pull/22444
## Which issue does this PR close? - Closes #22441. ## Rationale for this change `EliminateOuterJoin` needs to identify "null-rejecting" columns; a column is null-rejecting with respect to an expression if a NULL value in the column yields a NULL or false value for the expression. This analysis was unsound with respect to `IS TRUE`, `IS FALSE`, and `IS NOT UNKNOWN` operators: those operators are null-rejecting at the root of the WHERE clause, but they may not be null-rejecting when nested inside an expression tree. The analysis checked this correctly for `IS NOT NULL` but neglected to apply similar logic for these other three operators. This resulted in incorrectly converting outer joins to inner joins in some cases, producing incorrect query results. As part of fixing this, this PR also makes a bunch of improvements to the null-rejection analysis, enumerated below, resulting in more accurate null-rejection analysis. ## What changes are included in this PR? * Rename `extract_non_nullable_columns` to `extract_null_rejecting_columns`: "nullability" is a property of a column, "null-rejecting" is a more complex property describing the relationship between a column and an expression. * Use `Operator::returns_null_on_null()` instead of maintaining a hand-rolled and very incomplete list of null-propagating binary expressions. We now compute null-rejection correctly for arithmetic, bitwise, and regex operators, for example. * Handle null-rejection correctly for `Expr::Negative` * Rewrite the logic for handling `OR` and nested `AND` operators to be more clear, and also more efficient * Rewrite and expand comments throughout for clarity * Add unit and SLT tests ## Are these changes tested? Yes; new unit and SLT tests added. ## Are there any user-facing changes? Yes, query result correctness fix. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
