neilconway opened a new pull request, #22343: URL: https://github.com/apache/datafusion/pull/22343
## Which issue does this PR close? - Closes #11262. ## Rationale for this change If a filter consists of a mix of cheap and expensive predicates, evaluating the cheap predicates first can be more efficient, because it reduces the number of rows that the expensive predicate must be evaluated on. This PR implements this idea, by reordering predicates in a conjunction to place "cheap" predicates first. Predicates are assessed as "cheap" or "expensive" using an intentionally simple heuristic: "cheap" predicates are expressions that consist of only cheap operations like binary comparisons, negations, and casts, and "expensive" predicates are everything else (e.g., `LIKE`, regexp matching, subqueries, and function calls). Importantly, we use a stable sort when reordering predicates, which means that the user-specified order of operations is preserved within these two classes. We avoid reordering predicates if the filter contains a volatile expression, to be safe. We could be a bit fancier and reorder conjuncts in the prefix of the filter list before the volatile expression, but we don't attempt to do that for now. ## What changes are included in this PR? * Add a new `reorder_predicates` helper * Invoke `reorder_predicates` as part of the `PushDownFilter` rewrite pass * Add unit tests for `reorder_predicates` * Update expected query plans in SLT * Add migration guide note for change to predicate evaluation order ## Are these changes tested? Yes. Added new unit tests for predicate reordering behavior, updated some expected `EXPLAIN` output. ## Are there any user-facing changes? Yes. Users that expect their predicates to be evaluated in a strictly left-to-right manner might see changes in performance and/or behavior. Performance changes could be improvements or regressions. Behavioral changes are possible if the query includes fallible operations like certain casts or division by zero. Note that the SQL standard is clear that implementations are allowed to evaluate predicates in any order, so user queries that depend on an evaluation order are fundamentally fragile. Users can rewrite predicates using `CASE` if they need to enforce an evaluation order. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
