SubhamSinghal opened a new issue, #21302:
URL: https://github.com/apache/datafusion/issues/21302
### Is your feature request related to a problem or challenge?
The EliminateOuterJoin optimizer rule converts outer joins to inner joins
when WHERE clause predicates reject null rows from
the nullable side. However, extract_non_nullable_columns only recognizes a
limited set of expression types (comparisons,
AND/OR, NOT, IS NOT NULL, CAST/TRY_CAST) and falls through to a no-op
default for all others.
This means common null-rejecting expressions like IN and BETWEEN are not
recognized. For example:
SELECT * FROM t1 LEFT JOIN t2 ON t1.a = t2.a WHERE t2.b IN (1, 2, 3);
The IN predicate rejects nulls — if t2.b is NULL (from LEFT JOIN padding),
the expression evaluates to NULL, which is filtered
out. So this LEFT JOIN is semantically equivalent to an INNER JOIN, but the
optimizer keeps it as a LEFT JOIN today.
Similarly for BETWEEN:
SELECT * FROM t1 LEFT JOIN t2 ON t1.a = t2.a WHERE t2.b BETWEEN 1 AND 10;
### Describe the solution you'd like
Add Expr::InList and Expr::Between handling to extract_non_nullable_columns
in eliminate_outer_join.rs. Both expressions are null-rejecting on their input
column — if the input is NULL, the result is always NULL regardless of the
list/range contents (even if the list contains NULL values).
### Describe alternatives you've considered
_No response_
### Additional context
_No response_
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]