Re: [PR] bench: add predicate_eval SQL micro-benchmark suite for conjunctive filter evaluation [datafusion]

via GitHub Thu, 04 Jun 2026 06:32:53 -0700


adriangb commented on PR #22704:
URL: https://github.com/apache/datafusion/pull/22704#issuecomment-4622619165


   # Results
   
   **Run config (defaults):** `BENCH_NAME=predicate_eval PRED_ROWS=1000000` 
(all 10 subgroups, scale subgroup at its own 5k/100k/5M/50M sizes), criterion 
10 samples/query, no engine config set (measures DataFusion's built-in 
left-deep AND short-circuit).
   
   This was run on a noisy laptop, noise is expected.
   
   ## `costsel` — cost & selectivity point opposite ways (expensive predicate 
is the selective one)
   
   | Benchmark | Median | Mean | Mean 95% CI |
   |---|--:|--:|--|
   | `costsel_q01_regexp_selective_last` | 5.113 ms | 5.184 ms | [5.097 ms, 
5.287 ms] |
   | `costsel_q02_regexp_selective_first` | 1.642 ms | 1.654 ms | [1.636 ms, 
1.675 ms] |
   | `costsel_q03_cheap_unselective_then_expensive_selective` | 1.264 ms | 
1.265 ms | [1.250 ms, 1.283 ms] |
   
   ## `cost` — per-predicate cost, at equal selectivity
   
   | Benchmark | Median | Mean | Mean 95% CI |
   |---|--:|--:|--|
   | `cost_q10_expensive_first` | 2.261 ms | 2.545 ms | [2.062 ms, 3.152 ms] |
   | `cost_q11_cheap_first` | 653.311 µs | 660.827 µs | [648.549 µs, 678.636 
µs] |
   
   ## `selectivity` — per-predicate selectivity, at equal cost
   
   | Benchmark | Median | Mean | Mean 95% CI |
   |---|--:|--:|--|
   | `selectivity_q20_unselective_first` | 454.933 µs | 455.685 µs | [449.862 
µs, 462.258 µs] |
   | `selectivity_q21_selective_first` | 420.744 µs | 422.432 µs | [418.718 µs, 
426.746 µs] |
   
   ## `cardinality` — conjunct count k = 2/4/8/16
   
   | Benchmark | Median | Mean | Mean 95% CI |
   |---|--:|--:|--|
   | `cardinality_q30_k2` | 424.973 µs | 429.211 µs | [414.203 µs, 447.021 µs] |
   | `cardinality_q31_k4` | 608.709 µs | 613.319 µs | [590.725 µs, 637.308 µs] |
   | `cardinality_q32_k8` | 3.018 ms | 2.874 ms | [2.195 ms, 3.556 ms] |
   | `cardinality_q33_k16` | 3.996 ms | 4.392 ms | [3.787 ms, 5.123 ms] |
   
   ## `width` — string-column width (PRED_FILL = 2/30/170 chars)
   
   | Benchmark | Median | Mean | Mean 95% CI |
   |---|--:|--:|--|
   | `width_q40_narrow` | 8.387 ms | 7.571 ms | [6.594 ms, 8.428 ms] |
   | `width_q41_wide` | 5.822 ms | 6.005 ms | [5.740 ms, 6.326 ms] |
   | `width_q42_xwide` | 30.149 ms | 30.270 ms | [29.570 ms, 31.062 ms] |
   
   ## `scale` — row count 5k/100k/5M/50M
   
   | Benchmark | Median | Mean | Mean 95% CI |
   |---|--:|--:|--|
   | `scale_q50_5k` | 195.848 µs | 194.429 µs | [190.381 µs, 197.835 µs] |
   | `scale_q51_100k` | 300.590 µs | 301.308 µs | [299.721 µs, 303.113 µs] |
   | `scale_q52_5m` | 5.157 ms | 5.231 ms | [5.114 ms, 5.361 ms] |
   | `scale_q53_50m` | 48.703 ms | 48.883 ms | [48.221 ms, 49.680 ms] |
   
   ## `neutral` — order-insensitive control (equal cost, none selective)
   
   | Benchmark | Median | Mean | Mean 95% CI |
   |---|--:|--:|--|
   | `neutral_q60_cheap_uniform` | 797.491 µs | 816.330 µs | [768.721 µs, 
869.888 µs] |
   | `neutral_q61_expensive_uniform` | 4.263 ms | 4.463 ms | [4.072 ms, 4.905 
ms] |
   
   ## `correlation` — conditional vs marginal selectivity 
(indep/pos/anti-correlated)
   
   | Benchmark | Median | Mean | Mean 95% CI |
   |---|--:|--:|--|
   | `correlation_q70_independent` | 443.514 µs | 455.220 µs | [442.026 µs, 
472.727 µs] |
   | `correlation_q71_positive` | 353.478 µs | 371.662 µs | [349.020 µs, 
401.121 µs] |
   | `correlation_q72_anti` | 370.724 µs | 372.424 µs | [356.075 µs, 389.509 
µs] |
   
   ## `drift` — selectivity changes across the scan
   
   | Benchmark | Median | Mean | Mean 95% CI |
   |---|--:|--:|--|
   | `drift_q80_a_then_b` | 355.596 µs | 358.509 µs | [351.533 µs, 365.869 µs] |
   | `drift_q81_b_then_a` | 323.365 µs | 323.517 µs | [319.436 µs, 327.678 µs] |
   
   ## `nulls` — null density (two- vs three-valued predicate results)
   
   | Benchmark | Median | Mean | Mean 95% CI |
   |---|--:|--:|--|
   | `nulls_q90_no_nulls_control` | 415.150 µs | 434.152 µs | [412.615 µs, 
463.365 µs] |
   | `nulls_q91_half_null` | 382.150 µs | 391.865 µs | [375.653 µs, 409.933 µs] 
|
   
   
   Total runtime is ~5 min, but it is dominated by criterion overhead (warmup, 
etc.).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] bench: add predicate_eval SQL micro-benchmark suite for conjunctive filter evaluation [datafusion]

Reply via email to