wolffcm opened a new pull request, #6077:
URL: https://github.com/apache/arrow-datafusion/pull/6077

   # Which issue does this PR close?
   
   Closes #5996.
   
   # Rationale for this change
   
   Fixing this will allow lots of expression simplification to succeed where it 
failed before.
   
   In IOx we would like to be able to run the logical optimizer with 
`skip_failed_rules` set to `false`, since some of our custom rules are required 
to run and produce user-facing errors.
   
   When I ran IOx tests with `skip_failed_rules` set to `false`, I found issue 
#5996.
   
   # What changes are included in this PR?
   
   The issue is with the `SimplifyExpressions` logical rule. As it traverses 
the logical plan, it includes both the output schema for the current node and 
the merged input schema from the node's inputs.
   
   The inclusion of two schemas causes an issue when various expression 
simplifiers inquire about attributes of expressions, e.g., here, which was the 
case for #5996:
   
https://github.com/apache/arrow-datafusion/blob/011adc203e4a02a822a8f0008852b90fb7441264/datafusion/optimizer/src/simplify_expressions/context.rs#L130-L139
   
   This PR refines the schema used when simplifying to just the node's merged 
input schema. There is just one exception to using the child schema: for table 
scans with inlined predicates, those expressions need to be evaluated against 
the scan's output schema.
   
   # Are these changes tested?
   
   I added unit tests for both the issue with `power(f, 1)` from the issue, and 
also for the case of an inlined table scan.
   
   As a baseline I ran all the unit tests on `main` with `skip_failed_rules` 
set to false. I found just two failures:
   - `sql::select::use_between_expression_in_select_query`
   - `sql::subqueries::support_limit_subquery`
   
   (Seems like there should be more, based on #4685? Maybe I am not running all 
of the tests)
   
   When I run my branch wtih `skip_failed_rules` set to `false`. There was just 
one failure, which was already on `main`:
   - `sql::subqueries::support_limit_subquery`
   
   So this PR also allows `use_between_expression_in_select_query` to pass when 
skipping failed rules. It was failing in a similar way as #5996.
   
   I also ensured that the tests mentioned in #5208 (a recent bug fix that also 
touches on what schemas to use) all pass on my branch with `skip_failed_rules` 
set to `false`:
   - `right_anti_filter_push_down`
   - `right_semi_with_alias_filter`
   - `csv_query_group_by_and_having_and_where` (has inlined table predicate)
   
   # Are there any user-facing changes?
   
   No.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to