zhuqi-lucas commented on issue #22405:
URL: https://github.com/apache/datafusion/issues/22405#issuecomment-4545649696

   ## Status (May 2026)
   
   Phase 1 prototype shipped as #22518:
   
   - ✅ A/B sampling (measure `partial_ns/row` + `passthrough_ns/row` + ratio)
   - ✅ Cost crossover decision: `skip ⇔ ratio > passthrough_ns / partial_ns` 
(derived from the closed-form `cost_keep` vs `cost_skip` comparison, no magic 
constants)
   - ✅ ≤ 1 % overhead (10k passthrough sample per partition; default config 
keeps the operator-wide `elapsed_compute` timer doing the measurement, no extra 
`Instant::now()` in the hot path)
   - ✅ Diagnostic gauges exposed via EXPLAIN ANALYZE: 
`partial_agg_probe_partial_ns_per_row`, `_passthrough_ns_per_row`, 
`_ratio_per_mille`, `_cost_decision_skip`
   - ✅ ClickBench partitioned (ARM Neoverse-V2): 10 queries faster (Q19 +1.43×, 
Q39 +1.30×, Q29 +1.23×, Q18 +1.12×, …), 1 minor regression (Q42 ~15 ms, noise), 
total **−1.5 %**
   - ⏳ **Segment-level re-probing is deferred.** Attempted in #22518 but 
reverted: when the probe re-enters the partial-agg path after a committed skip 
segment, the operator panics at `multi_group_by/primitive.rs:156` with an 
out-of-bounds `lhs_row`. Looks like `GroupValues::emit(EmitTo::All)` clears the 
per-column arrays but the hash→index map retains stale entries from before the 
emit — fine for the existing one-shot skip path, but breaks any path that goes 
`partial → skip → partial`. Worth tackling as a follow-up once that reset 
semantic is sorted out.
   
   Will keep this issue open until #22518 merges and we have post-merge 
benchmark data; the re-probing follow-up should land as a separate PR.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to