adriangb opened a new pull request, #22704:
URL: https://github.com/apache/datafusion/pull/22704

   ## Which issue does this PR close?
   
   <!-- No tracking issue; this is a standalone benchmark contribution. -->
   
   This PR does not close an issue. It adds a benchmark suite that supports the
   ongoing discussion around predicate-ordering / adaptive filter evaluation
   (e.g. the static cheap/expensive reordering in #22343 and the runtime,
   statistics-based reordering explored in #22698). It deliberately benchmarks
   *no specific implementation* — see below.
   
   ## Rationale for this change
   
   Conjunctive (`AND`) filter evaluation in `FilterExec` is a left-deep
   `BinaryExpr(And)` chain, and the order conjuncts are evaluated in can change
   runtime by large factors: the `check_short_circuit` / pre-selection path
   (`PRE_SELECTION_THRESHOLD = 0.2`) physically compacts the batch once a 
leading
   conjunct passes few enough rows, so a cheap-and-selective predicate that runs
   early saves every later predicate work. This makes predicate ordering an 
active
   area (static heuristics, runtime/adaptive schemes, cost models).
   
   There is currently no benchmark suite that isolates the dimensions that drive
   this. Existing macro-benchmarks (TPC-H/DS, ClickBench) only incidentally
   exercise filter ordering, so they can't tell you *why* a reordering change
   helped or hurt, or guard against regressions in the order-neutral case.
   
   ## What changes are included in this PR?
   
   A new, **implementation-agnostic** SQL benchmark suite,
   `benchmarks/sql_benchmarks/predicate_eval`, built on the existing 
`.benchmark`
   template framework (no engine code, no new Rust). It measures DataFusion's
   built-in short-circuit by default and sets no engine config of its own; any
   predicate-ordering system under test is toggled purely via its native
   `DATAFUSION_*` environment variable (the bench harness builds its
   `SessionContext` with `SessionConfig::from_env`), so the same scenarios can
   characterise the baseline, a static heuristic, an adaptive scheme, or a 
future
   cost model and be compared apples-to-apples. Data size and string width are
   controlled by `PRED_ROWS` / `PRED_FILL`.
   
   It is organised into 10 subgroups (select with `BENCH_SUBGROUP`), each 
isolating
   one cost axis of filter evaluation:
   
   | Subgroup | Axis it isolates |
   |---|---|
   | `costsel` | cost-weighted ordering (`cost/(1-sel)`): 
expensive-but-selective must run first |
   | `cost` | equal selectivity, unequal cost |
   | `selectivity` | equal cost, unequal selectivity |
   | `cardinality` | conjunct count `k = 2/4/8/16` |
   | `width` | string-column width (`FILL` = 2 / 30 / 170 chars) |
   | `scale` | row count `5k / 100k / 5M / 50M` (overhead-dominated → 
amortized) |
   | `neutral` | order-irrelevant case — pure-overhead / regression guard |
   | `correlation` | independent / positively / anti-correlated predicates 
(conditional selectivity) |
   | `drift` | selectivity that flips partway through the scan |
   | `nulls` | three-valued-logic path (nulls disable short-circuit) |
   
   Data is synthetic and generated inline by each subgroup's load SQL (no 
external
   files); `PRED_ROWS` sizes it and `PRED_FILL` sets string width. Wired into 
`bench.sh`
   (`./bench.sh run predicate_eval`) and documented in
   `benchmarks/sql_benchmarks/README.md`.
   
   The design was informed by surveying how Velox drives the analogous decision
   (it ranks by cycles-per-row-eliminated, `time / (rows_in - rows_out)`), and 
by
   covering the cases a static cheap/expensive heuristic structurally misses
   (expensive-but-selective, correlated, drifting selectivity).
   
   > Note: the `scale` subgroup's `q52`/`q53` build 5M / 50M-row tables; run a
   > single point with `BENCH_QUERY` rather than the whole subgroup if that is 
too
   > heavy.
   
   ## Are these changes tested?
   
   These are benchmark definitions, not engine code. Each `.benchmark` includes 
an
   `assert` that the generated table is non-empty, and every subgroup was run
   locally at small `ROWS` to confirm the suite parses, loads, asserts, and
   executes end-to-end. The query results themselves are order-invariant
   (`SELECT count(*) ...`), so any predicate-ordering system can be validated 
for
   correctness by diffing counts with the optimization on vs. off.
   
   ## Are there any user-facing changes?
   
   No. This only adds an opt-in benchmark suite and its documentation; no public
   API, engine behavior, or default configuration changes.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to