jorgecarleitao commented on a change in pull request #7971:
URL: https://github.com/apache/arrow/pull/7971#discussion_r471899766
##########
File path: rust/datafusion/src/execution/physical_plan/udf.rs
##########
@@ -146,3 +154,99 @@ impl PhysicalExpr for ScalarFunctionExpr {
(fun)(&inputs)
}
}
+
+/// A generic aggregate function
+/*
+An aggregate function accepts an arbitrary number of arguments, of arbitrary
data types,
+and returns an arbitrary type based on the incoming types.
+
+It is the developer of the function's responsibility to ensure that the
aggregator correctly handles the different
+types that are presented to them, and that the return type correctly matches
the type returned by the
+aggregator.
+
+It is the user of the function's responsibility to pass arguments to the
function that have valid types.
+*/
+#[derive(Clone)]
+pub struct AggregateFunction {
+ /// Function name
+ pub name: String,
+ /// A list of arguments and their respective types. A function can accept
more than one type as argument
+ /// (e.g. sum(i8), sum(u8)).
+ pub arg_types: Vec<Vec<DataType>>,
+ /// Return type. This function takes
+ pub return_type: ReturnType,
Review comment:
This change and is under discussion in the mailing list.
Essentially, the question is whether we should accept UDFs to have an
input-dependent type or not (should this be a function or a DataType).
If we decide to not accept input-dependent types, then UDFs are simpler
(multiple input types, single output type), but we can't re-write our
aggregates as UDFs
If we decide to accept input-dependent types, then UDFs are more complex
(multiple input types, multiple output type), and we can uniformize them all in
a single interface.
We can also do something in the middle, on which we declare an interface for
functions in our end that support (multiple input types, multiple output type),
but only expose public interfaces to register (multiple input types, single
output type) UDFs.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]