alamb commented on code in PR #6800: URL: https://github.com/apache/arrow-datafusion/pull/6800#discussion_r1261178612
########## datafusion/physical-expr/src/aggregate/average.rs: ########## @@ -383,6 +435,189 @@ impl RowAccumulator for AvgRowAccumulator { } } +/// An accumulator to compute the average of `[PrimitiveArray<T>]`. +/// Stores values as native types, and does overflow checking +/// +/// F: Function that calcuates the average value from a sum of +/// T::Native and a total count +#[derive(Debug)] +struct AvgGroupsAccumulator<T, F> +where + T: ArrowNumericType + Send, + F: Fn(T::Native, u64) -> Result<T::Native> + Send, +{ + /// The type of the internal sum + sum_data_type: DataType, + + /// The type of the returned sum + return_data_type: DataType, + + /// Count per group (use u64 to make UInt64Array) + counts: Vec<u64>, + + /// Sums per group, stored as the native type + sums: Vec<T::Native>, Review Comment: Ah -- I see what you are saying -- I think we could potentially use a `StructArray` for the state (which would be a single "column" in arrow) but the underlying storage is still two separate contiguous arrays. Maybe we could use `FixedSizeBinaryArray` 🤔 and pack/unpack the tuples to the appropriate size It would be an interesting experiment ########## datafusion/physical-expr/src/aggregate/average.rs: ########## @@ -383,6 +435,189 @@ impl RowAccumulator for AvgRowAccumulator { } } +/// An accumulator to compute the average of `[PrimitiveArray<T>]`. +/// Stores values as native types, and does overflow checking +/// +/// F: Function that calcuates the average value from a sum of +/// T::Native and a total count +#[derive(Debug)] +struct AvgGroupsAccumulator<T, F> +where + T: ArrowNumericType + Send, + F: Fn(T::Native, u64) -> Result<T::Native> + Send, +{ + /// The type of the internal sum + sum_data_type: DataType, + + /// The type of the returned sum + return_data_type: DataType, + + /// Count per group (use u64 to make UInt64Array) + counts: Vec<u64>, + + /// Sums per group, stored as the native type + sums: Vec<T::Native>, Review Comment: Ah -- I see what you are saying -- I think we could potentially use a `StructArray` for the state (which would be a single "column" in arrow) but the underlying storage is still two separate contiguous arrays. Maybe we could use `FixedSizeBinaryArray` 🤔 and pack/unpack the tuples to the appropriate size It would be an interesting experiment -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org