neilconway opened a new pull request, #22343:
URL: https://github.com/apache/datafusion/pull/22343

   ## Which issue does this PR close?
   
   - Closes #11262.
   
   ## Rationale for this change
   
   If a filter consists of a mix of cheap and expensive predicates, evaluating 
the cheap predicates first can be more efficient, because it reduces the number 
of rows that the expensive predicate must be evaluated on. This PR implements 
this idea, by reordering predicates in a conjunction to place "cheap" 
predicates first.
   
   Predicates are assessed as "cheap" or "expensive" using an intentionally 
simple heuristic: "cheap" predicates are expressions that consist of only cheap 
operations like binary comparisons, negations, and casts, and "expensive" 
predicates are everything else (e.g., `LIKE`, regexp matching, subqueries, and 
function calls). Importantly, we use a stable sort when reordering predicates, 
which means that the user-specified order of operations is preserved within 
these two classes.
   
   We avoid reordering predicates if the filter contains a volatile expression, 
to be safe. We could be a bit fancier and reorder conjuncts in the prefix of 
the filter list before the volatile expression, but we don't attempt to do that 
for now.
   
   ## What changes are included in this PR?
   
   * Add a new `reorder_predicates` helper
   * Invoke `reorder_predicates` as part of the `PushDownFilter` rewrite pass
   * Add unit tests for `reorder_predicates`
   * Update expected query plans in SLT
   * Add migration guide note for change to predicate evaluation order
   
   ## Are these changes tested?
   
   Yes. Added new unit tests for predicate reordering behavior, updated some 
expected `EXPLAIN` output.
   
   ## Are there any user-facing changes?
   
   Yes. Users that expect their predicates to be evaluated in a strictly 
left-to-right manner might see changes in performance and/or behavior. 
Performance changes could be improvements or regressions. Behavioral changes 
are possible if the query includes fallible operations like certain casts or 
division by zero. Note that the SQL standard is clear that implementations are 
allowed to evaluate predicates in any order, so user queries that depend on an 
evaluation order are fundamentally fragile. Users can rewrite predicates using 
`CASE` if they need to enforce an evaluation order.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to