neilconway opened a new pull request, #22444:
URL: https://github.com/apache/datafusion/pull/22444

   ## Which issue does this PR close?
   
   - Closes #22441.
   
   ## Rationale for this change
   
   `EliminateOuterJoin` needs to identify "null-rejecting" columns; a column is 
null-rejecting with respect to an expression if a NULL value in the column 
yields a NULL or false value for the expression. This analysis was unsound with 
respect to `IS TRUE`, `IS FALSE`, and `IS NOT UNKNOWN` operators: those 
operators are null-rejecting at the root of the WHERE clause, but they may not 
be null-rejecting when nested inside an expression tree. The analysis checked 
this correctly for `IS NOT NULL` but neglected to apply similar logic for these 
other three operators. This resulted in incorrectly converting outer joins to 
inner joins in some cases, producing incorrect query results.
   
   As part of fixing this, this PR also makes a bunch of improvements to the 
null-rejection analysis, enumerated below, resulting in more accurate 
null-rejection analysis.
   
   ## What changes are included in this PR?
   
   * Rename `extract_non_nullable_columns` to `extract_null_rejecting_columns`: 
"nullability" is a property of a column, "null-rejecting" is a more complex 
property describing the relationship between a column and an expression.
   * Use `Operator::returns_null_on_null()` instead of maintaining a 
hand-rolled and very incomplete list of null-propagating binary expressions. We 
now compute null-rejection correctly for arithmetic, bitwise, and regex 
operators, for example.
   * Handle null-rejection correctly for `Expr::Negative`
   * Rewrite the logic for handling `OR` and nested `AND` operators to be more 
clear, and also more efficient
   * Rewrite and expand comments throughout for clarity
   * Add unit and SLT tests
   
   ## Are these changes tested?
   
   Yes; new unit and SLT tests added.
   
   ## Are there any user-facing changes?
   
   Yes, query result correctness fix.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to