wolffcm opened a new pull request, #6077: URL: https://github.com/apache/arrow-datafusion/pull/6077
# Which issue does this PR close? Closes #5996. # Rationale for this change Fixing this will allow lots of expression simplification to succeed where it failed before. In IOx we would like to be able to run the logical optimizer with `skip_failed_rules` set to `false`, since some of our custom rules are required to run and produce user-facing errors. When I ran IOx tests with `skip_failed_rules` set to `false`, I found issue #5996. # What changes are included in this PR? The issue is with the `SimplifyExpressions` logical rule. As it traverses the logical plan, it includes both the output schema for the current node and the merged input schema from the node's inputs. The inclusion of two schemas causes an issue when various expression simplifiers inquire about attributes of expressions, e.g., here, which was the case for #5996: https://github.com/apache/arrow-datafusion/blob/011adc203e4a02a822a8f0008852b90fb7441264/datafusion/optimizer/src/simplify_expressions/context.rs#L130-L139 This PR refines the schema used when simplifying to just the node's merged input schema. There is just one exception to using the child schema: for table scans with inlined predicates, those expressions need to be evaluated against the scan's output schema. # Are these changes tested? I added unit tests for both the issue with `power(f, 1)` from the issue, and also for the case of an inlined table scan. As a baseline I ran all the unit tests on `main` with `skip_failed_rules` set to false. I found just two failures: - `sql::select::use_between_expression_in_select_query` - `sql::subqueries::support_limit_subquery` (Seems like there should be more, based on #4685? Maybe I am not running all of the tests) When I run my branch wtih `skip_failed_rules` set to `false`. There was just one failure, which was already on `main`: - `sql::subqueries::support_limit_subquery` So this PR also allows `use_between_expression_in_select_query` to pass when skipping failed rules. It was failing in a similar way as #5996. I also ensured that the tests mentioned in #5208 (a recent bug fix that also touches on what schemas to use) all pass on my branch with `skip_failed_rules` set to `false`: - `right_anti_filter_push_down` - `right_semi_with_alias_filter` - `csv_query_group_by_and_having_and_where` (has inlined table predicate) # Are there any user-facing changes? No. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
