Re: [PR] Adaptive (runtime, stats-based) conjunct reordering for FilterExec [datafusion]

via GitHub Fri, 03 Jul 2026 00:58:34 -0700


adriangb commented on PR #22698:
URL: https://github.com/apache/datafusion/pull/22698#issuecomment-4874071913


   ## Benchmark summary (adaptive filter reordering)
   
   Ran two matched experiments on the PR head (`a4563568f3`), each pinning
   `baseline.ref == changed.ref` to the same commit so the binary is identical 
and
   only the config flag differs:
   
   - **Feature:** `adaptive_filter_reordering` off vs on 
([trigger](https://github.com/apache/datafusion/pull/22698#issuecomment-4873746448))
   - **A/A control:** flag off on both sides 
([trigger](https://github.com/apache/datafusion/pull/22698#issuecomment-4873746305))
 — establishes the per-suite noise floor.
   
   I ran it twice; the control is what makes the numbers trustworthy, because 
the
   per-query noise is large (tpch/tpcds swings up to ±22–27% between identical
   binaries; clickbench has a ~4.6% systematic advantage to whichever side runs
   second). Only deltas that clear that floor, with tight variance, and are 
absent
   from the A/A control, count.
   
   ### Wins (reproduced across both runs)
   
   | Query | off → on | Speedup |
   |---|---|---|
   | **tpch Q6** | ~132 → ~113 ms | **1.17× (−15%)** |
   | **tpch Q12** | ~196 → ~138 ms | **1.42× (−30%)** |
   
   Both are multi-conjunct `lineitem` filters with a buried selective conjunct —
   exactly the case this targets (`BinaryExpr` `AND` only short-circuits on the
   *leftmost* conjunct). Deltas are 5–10× the run-to-run stddev and do not 
appear
   in the A/A control.
   
   ### No regressions
   
   - The apparent tpcds **Q72** slowdown in the first run (+7.5%) **flipped to 
−5%
     faster** in the confirmation run — a sign-flip under the same flag means 
noise
     (2 s query, ±90 ms baseline variance). tpcds net is **−0.9% (faster)**.
   - Every other "slower" flag falls inside its suite's A/A control noise 
floor, and
     the two runs disagree on which queries are slower.
   - **clickbench**: neutral within noise; the engaging multi-conjunct queries
     (Q21/Q22/Q36/Q38–40) are flat. The feature is **off by default**, and when 
a
     conjunction doesn't benefit from reordering it evaluates the written 
predicate
     as-is (no compact-once overhead).
   - **0 correctness failures** across all runs (a conjunction's result is
     independent of evaluation order).
   
   **Net:** two solid wins on the target case, no confirmed regressions, no
   correctness changes.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] Adaptive (runtime, stats-based) conjunct reordering for FilterExec [datafusion]

Reply via email to