alamb commented on code in PR #22518:
URL: https://github.com/apache/datafusion/pull/22518#discussion_r3375729952


##########
datafusion/common/src/config.rs:
##########
@@ -648,6 +648,41 @@ config_namespace! {
         /// aggregation ratio check and trying to switch to skipping 
aggregation mode
         pub skip_partial_aggregation_probe_rows_threshold: usize, default = 
100_000
 
+        /// (experimental) When true, run an A/B sampling window after
+        /// the partial probe completes (see
+        /// `skip_partial_aggregation_probe_rows_threshold`): route the
+        /// next `skip_partial_aggregation_ab_sampling_rows` input rows
+        /// through the passthrough (`transform_to_states`) path,
+        /// measure `passthrough_ns/row`, and compare it against the
+        /// previously measured `partial_ns/row` plus the observed
+        /// `num_groups / input_rows` ratio. Skip partial aggregation
+        /// iff `ratio > passthrough_ns / partial_ns` — the cost
+        /// crossover from the closed-form comparison of
+        /// `keep_partial` vs `skip_partial` total work.
+        ///
+        /// Targets ClickBench Q18-shape queries where the ratio
+        /// (~0.56) sits below the fixed 0.8 threshold so partial agg
+        /// keeps running, but the absolute work (heavy variable-length
+        /// keys, complex aggregates) makes it net-negative. The
+        /// existing `skip_partial_aggregation_probe_ratio_threshold`
+        /// short-circuit still fires before A/B when it applies.
+        ///
+        /// EXPLAIN ANALYZE surfaces the measured numbers via four dev
+        /// gauges: `partial_agg_probe_partial_ns_per_row`,
+        /// `partial_agg_probe_passthrough_ns_per_row`,
+        /// `partial_agg_probe_ratio_per_mille`, and
+        /// `partial_agg_probe_cost_decision_skip`.
+        pub skip_partial_aggregation_use_cost_model: bool, default = true
+
+        /// Number of input rows used in the A/B sampling window after the

Review Comment:
   is the observation here that the hard coded lmits / heuristics we built into 
the probe decision don't always work in real world situations and thus an 
actual runtime decision is better?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to