alamb commented on code in PR #6800:
URL: https://github.com/apache/arrow-datafusion/pull/6800#discussion_r1261178612


##########
datafusion/physical-expr/src/aggregate/average.rs:
##########
@@ -383,6 +435,189 @@ impl RowAccumulator for AvgRowAccumulator {
     }
 }
 
+/// An accumulator to compute the average of `[PrimitiveArray<T>]`.
+/// Stores values as native types, and does overflow checking
+///
+/// F: Function that calcuates the average value from a sum of
+/// T::Native and a total count
+#[derive(Debug)]
+struct AvgGroupsAccumulator<T, F>
+where
+    T: ArrowNumericType + Send,
+    F: Fn(T::Native, u64) -> Result<T::Native> + Send,
+{
+    /// The type of the internal sum
+    sum_data_type: DataType,
+
+    /// The type of the returned sum
+    return_data_type: DataType,
+
+    /// Count per group (use u64 to make UInt64Array)
+    counts: Vec<u64>,
+
+    /// Sums per group, stored as the native type
+    sums: Vec<T::Native>,

Review Comment:
   Ah -- I see what you are saying -- I think we could potentially use a 
`StructArray` for the state (which would be a single "column" in arrow) but the 
underlying storage is still two separate contiguous arrays. 
   
   Maybe we could use `FixedSizeBinaryArray` 🤔  and pack/unpack the tuples to 
the appropriate size
   
   It would be an interesting experiment



##########
datafusion/physical-expr/src/aggregate/average.rs:
##########
@@ -383,6 +435,189 @@ impl RowAccumulator for AvgRowAccumulator {
     }
 }
 
+/// An accumulator to compute the average of `[PrimitiveArray<T>]`.
+/// Stores values as native types, and does overflow checking
+///
+/// F: Function that calcuates the average value from a sum of
+/// T::Native and a total count
+#[derive(Debug)]
+struct AvgGroupsAccumulator<T, F>
+where
+    T: ArrowNumericType + Send,
+    F: Fn(T::Native, u64) -> Result<T::Native> + Send,
+{
+    /// The type of the internal sum
+    sum_data_type: DataType,
+
+    /// The type of the returned sum
+    return_data_type: DataType,
+
+    /// Count per group (use u64 to make UInt64Array)
+    counts: Vec<u64>,
+
+    /// Sums per group, stored as the native type
+    sums: Vec<T::Native>,

Review Comment:
   Ah -- I see what you are saying -- I think we could potentially use a 
`StructArray` for the state (which would be a single "column" in arrow) but the 
underlying storage is still two separate contiguous arrays. 
   
   Maybe we could use `FixedSizeBinaryArray` 🤔  and pack/unpack the tuples to 
the appropriate size
   
   It would be an interesting experiment



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to