Re: [PR] refactor(hash-aggr): Simplify aggregate hash table [datafusion]

via GitHub Sat, 04 Jul 2026 05:00:50 -0700


2010YOUY01 commented on code in PR #23309:
URL: https://github.com/apache/datafusion/pull/23309#discussion_r3523163131



##########
datafusion/physical-plan/src/aggregates/aggregate_hash_table/common.rs:
##########
@@ -60,19 +72,36 @@ pub(in crate::aggregates) struct FinalMarker;
 /// [`GroupsAccumulator`]. Both use columnar storage so aggregation can stay
 /// vectorized.
 ///
-/// # Marker Type
-/// `AggrMode` selects the aggregate semantics.
+/// # Mode
+///
+/// [`AggregateTableMode`] controls how input batches update accumulator state
+/// and how output batches are materialized.
 ///
-/// e.g. `AggregateHashTable::<PartialMarker>::new(...)` creates an aggregate 
hash table
-/// for the partial hash aggregate stage, the input schema is raw rows and 
output
-/// schema is intermediate states.
+/// ```text
+/// Example: `AVG(x) GROUP BY k`.
 ///
-/// It is a zero-sized compile-time marker, so each stage keeps its update 
logic
-/// in a separate impl block, to make the behavior difference explicit.
-pub(in crate::aggregates) struct AggregateHashTable<AggrMode> {
+/// In `Partial` mode, the table stores partial state:
+///     k, sum(x), count(x)
+/// The input batch contains raw values:
+///     k, x
+/// The output batch also contains partial state:
+///     k, sum(x), count(x)
+/// ```
+///
+/// So input uses [`GroupsAccumulator::update_batch`], and output uses
+/// [`GroupsAccumulator::state`].
+///
+/// Other modes use different input/output combinations:
+/// - `Final`: merge_batch + evaluate
+/// - `PartialReduce`: merge_batch + state
+/// - `Single`: update_batch + evaluate
+pub(in crate::aggregates) struct AggregateHashTable {

Review Comment:
   The templated-function approach seems like a good idea. I’ll give it a try 
and see how it looks.
   
   If the templated function is not expressive enough in some places/at some 
point, we can still fully separate the implementation by aggregation variant.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] refactor(hash-aggr): Simplify aggregate hash table [datafusion]

Reply via email to