adriangb opened a new pull request, #22704: URL: https://github.com/apache/datafusion/pull/22704
## Which issue does this PR close? <!-- No tracking issue; this is a standalone benchmark contribution. --> This PR does not close an issue. It adds a benchmark suite that supports the ongoing discussion around predicate-ordering / adaptive filter evaluation (e.g. the static cheap/expensive reordering in #22343 and the runtime, statistics-based reordering explored in #22698). It deliberately benchmarks *no specific implementation* — see below. ## Rationale for this change Conjunctive (`AND`) filter evaluation in `FilterExec` is a left-deep `BinaryExpr(And)` chain, and the order conjuncts are evaluated in can change runtime by large factors: the `check_short_circuit` / pre-selection path (`PRE_SELECTION_THRESHOLD = 0.2`) physically compacts the batch once a leading conjunct passes few enough rows, so a cheap-and-selective predicate that runs early saves every later predicate work. This makes predicate ordering an active area (static heuristics, runtime/adaptive schemes, cost models). There is currently no benchmark suite that isolates the dimensions that drive this. Existing macro-benchmarks (TPC-H/DS, ClickBench) only incidentally exercise filter ordering, so they can't tell you *why* a reordering change helped or hurt, or guard against regressions in the order-neutral case. ## What changes are included in this PR? A new, **implementation-agnostic** SQL benchmark suite, `benchmarks/sql_benchmarks/predicate_eval`, built on the existing `.benchmark` template framework (no engine code, no new Rust). It measures DataFusion's built-in short-circuit by default and sets no engine config of its own; any predicate-ordering system under test is toggled purely via its native `DATAFUSION_*` environment variable (the bench harness builds its `SessionContext` with `SessionConfig::from_env`), so the same scenarios can characterise the baseline, a static heuristic, an adaptive scheme, or a future cost model and be compared apples-to-apples. Data size and string width are controlled by `PRED_ROWS` / `PRED_FILL`. It is organised into 10 subgroups (select with `BENCH_SUBGROUP`), each isolating one cost axis of filter evaluation: | Subgroup | Axis it isolates | |---|---| | `costsel` | cost-weighted ordering (`cost/(1-sel)`): expensive-but-selective must run first | | `cost` | equal selectivity, unequal cost | | `selectivity` | equal cost, unequal selectivity | | `cardinality` | conjunct count `k = 2/4/8/16` | | `width` | string-column width (`FILL` = 2 / 30 / 170 chars) | | `scale` | row count `5k / 100k / 5M / 50M` (overhead-dominated → amortized) | | `neutral` | order-irrelevant case — pure-overhead / regression guard | | `correlation` | independent / positively / anti-correlated predicates (conditional selectivity) | | `drift` | selectivity that flips partway through the scan | | `nulls` | three-valued-logic path (nulls disable short-circuit) | Data is synthetic and generated inline by each subgroup's load SQL (no external files); `PRED_ROWS` sizes it and `PRED_FILL` sets string width. Wired into `bench.sh` (`./bench.sh run predicate_eval`) and documented in `benchmarks/sql_benchmarks/README.md`. The design was informed by surveying how Velox drives the analogous decision (it ranks by cycles-per-row-eliminated, `time / (rows_in - rows_out)`), and by covering the cases a static cheap/expensive heuristic structurally misses (expensive-but-selective, correlated, drifting selectivity). > Note: the `scale` subgroup's `q52`/`q53` build 5M / 50M-row tables; run a > single point with `BENCH_QUERY` rather than the whole subgroup if that is too > heavy. ## Are these changes tested? These are benchmark definitions, not engine code. Each `.benchmark` includes an `assert` that the generated table is non-empty, and every subgroup was run locally at small `ROWS` to confirm the suite parses, loads, asserts, and executes end-to-end. The query results themselves are order-invariant (`SELECT count(*) ...`), so any predicate-ordering system can be validated for correctness by diffing counts with the optimization on vs. off. ## Are there any user-facing changes? No. This only adds an opt-in benchmark suite and its documentation; no public API, engine behavior, or default configuration changes. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
