alamb commented on code in PR #7242:
URL: https://github.com/apache/arrow-datafusion/pull/7242#discussion_r1290416046


##########
docs/source/user-guide/sql/scalar_functions.md:
##########
@@ -1495,6 +1496,26 @@ from_unixtime(expression)
 - [make_list](#make_list)
 - [trim_array](#trim_array)
 
+### `array_aggregate`
+
+Allows the execution of arbitrary existing aggregate functions on the elements 
of a list.

Review Comment:
   ```suggestion
   Allows the execution of arbitrary existing aggregate function `name` on the 
elements of a list.
   ```



##########
datafusion/core/tests/sqllogictests/test_files/array.slt:
##########
@@ -2361,6 +2361,43 @@ from flatten_table;
 [1, 2, 3] [1, 2, 3, 4, 5, 6] [1, 2, 3] [1.0, 2.1, 2.2, 3.2, 3.3, 3.4]
 [1, 2, 3, 4, 5, 6] [8] [1, 2, 3] [1.0, 2.0, 3.0, 4.0, 5.0, 6.0]
 
+# array aggregate function
+## array aggregate
+query I
+select array_aggregate([1, 3, 5, 7], 'sum');

Review Comment:
   wow -- this is pretty neat



##########
datafusion/physical-expr/src/array_expressions.rs:
##########
@@ -2150,6 +2150,61 @@ pub fn array_has_all(args: &[ArrayRef]) -> 
Result<ArrayRef> {
     Ok(Arc::new(boolean_builder.finish()))
 }
 
+macro_rules! array_sum_internal(
+    ($ARRAY:expr, $ARRAY_TYPE:ident, $DATA_TYPE:ident) => {{
+        let mut values = vec![];
+        for arr in $ARRAY.iter() {
+            if let Some(arr) = arr {
+                let arr = downcast_arg!(arr, $ARRAY_TYPE);
+                let sum = arr.values().iter().sum::<$DATA_TYPE>();
+                values.push(sum);
+            }
+        }
+        Ok(Arc::new($ARRAY_TYPE::from(values)))
+    }}
+);
+
+/// array_aggregate SQL function
+pub fn array_aggregate(args: &[ArrayRef]) -> Result<ArrayRef> {
+    assert_eq!(args.len(), 2);
+    let func_name = args[1].as_string::<i32>().value(0);
+    let args = &args[0..1];
+    match func_name {
+        "sum" => array_sum(args),

Review Comment:
   If you implement this in terms of an 
[`AggregateExpr`](https://docs.rs/datafusion/latest/datafusion/physical_expr/trait.AggregateExpr.html)
 and then 
[Accumulator](https://docs.rs/datafusion/latest/datafusion/logical_expr/trait.Accumulator.html#)
 you could use the existing aggregate implementations. This would have several 
benefits:
   1. We would avoid code duplication and get all existing aggregates "for free"
   2. As new aggregates were added they would also be usable as an array 
aggregate
   3. They would work for non primitive types (e.g. DecimalArray, strings, etc)
   
   The basic idea would be to look up the [aggregate 
expr](https://docs.rs/datafusion/latest/datafusion/logical_expr/enum.Expr.html#variant.AggregateFunction)
 during the analysis / parsing phase, create a physical version, and then use 
it here to instantiate the accumulator
   
   Maybe you could prototype this approach by simply hard coding the mapping 
`sum` --> 
[[`AggregateFunction::sum`]](https://docs.rs/datafusion/latest/datafusion/physical_expr/aggregate/build_in/enum.AggregateFunction.html)
 here and using that to create an `Accumulator` 🤔 
   
   



##########
datafusion/core/tests/sqllogictests/test_files/array.slt:
##########
@@ -2361,6 +2361,43 @@ from flatten_table;
 [1, 2, 3] [1, 2, 3, 4, 5, 6] [1, 2, 3] [1.0, 2.1, 2.2, 3.2, 3.3, 3.4]
 [1, 2, 3, 4, 5, 6] [8] [1, 2, 3] [1.0, 2.0, 3.0, 4.0, 5.0, 6.0]
 
+# array aggregate function
+## array aggregate
+query I
+select array_aggregate([1, 3, 5, 7], 'sum');
+----
+16
+
+## array sum
+query IRI
+select array_sum([1, 3, 5, 7]),
+       array_sum([1.1, 2.2, 3.3]),
+       list_sum([1, -1, 0, 23]);
+----
+16 6.6 23
+
+# TODO: Support nulls in array.
+# query error DataFusion error: This feature is not implemented: Arrays with 
different types are not supported: \{Int64, Null\}
+# select array_sum([1, null, 3, null]);

Review Comment:
   Yes, I agree -- I think if we add a coercion pass that tries to coerce array 
elements into the same type this will magically start working



##########
datafusion/physical-expr/src/array_expressions.rs:
##########
@@ -2150,6 +2150,61 @@ pub fn array_has_all(args: &[ArrayRef]) -> 
Result<ArrayRef> {
     Ok(Arc::new(boolean_builder.finish()))
 }
 
+macro_rules! array_sum_internal(
+    ($ARRAY:expr, $ARRAY_TYPE:ident, $DATA_TYPE:ident) => {{
+        let mut values = vec![];
+        for arr in $ARRAY.iter() {
+            if let Some(arr) = arr {
+                let arr = downcast_arg!(arr, $ARRAY_TYPE);
+                let sum = arr.values().iter().sum::<$DATA_TYPE>();
+                values.push(sum);
+            }
+        }
+        Ok(Arc::new($ARRAY_TYPE::from(values)))
+    }}
+);
+
+/// array_aggregate SQL function
+pub fn array_aggregate(args: &[ArrayRef]) -> Result<ArrayRef> {
+    assert_eq!(args.len(), 2);
+    let func_name = args[1].as_string::<i32>().value(0);
+    let args = &args[0..1];
+    match func_name {
+        "sum" => array_sum(args),

Review Comment:
   If you implement this in terms of an 
[`AggregateExpr`](https://docs.rs/datafusion/latest/datafusion/physical_expr/trait.AggregateExpr.html)
 and then 
[Accumulator](https://docs.rs/datafusion/latest/datafusion/logical_expr/trait.Accumulator.html#)
 you could use the existing aggregate implementations. This would have several 
benefits:
   1. We would avoid code duplication and get all existing aggregates "for free"
   2. As new aggregates were added they would also be usable as an array 
aggregate
   3. They would work for non primitive types (e.g. DecimalArray, strings, etc)
   
   The basic idea would be to look up the [aggregate 
expr](https://docs.rs/datafusion/latest/datafusion/logical_expr/enum.Expr.html#variant.AggregateFunction)
 during the analysis / parsing phase, create a physical version, and then use 
it here to instantiate the accumulator
   
   Maybe you could prototype this approach by simply hard coding the mapping 
`sum` --> 
[[`AggregateFunction::sum`]](https://docs.rs/datafusion/latest/datafusion/physical_expr/aggregate/build_in/enum.AggregateFunction.html)
 here and using that to create an `Accumulator` 🤔 
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to