alamb commented on code in PR #7978:
URL: https://github.com/apache/arrow-datafusion/pull/7978#discussion_r1378086115
##########
datafusion/expr/src/udf.rs:
##########
@@ -17,12 +17,67 @@
//! Udf module contains foundational types that are used to represent UDFs in
DataFusion.
-use crate::{Expr, ReturnTypeFunction, ScalarFunctionImplementation, Signature};
+use crate::{
+ ColumnarValue, Expr, FuncMonotonicity, ReturnTypeFunction,
+ ScalarFunctionImplementation, Signature, TypeSignature, Volatility,
+};
+use arrow::array::ArrayRef;
+use arrow::datatypes::DataType;
+use datafusion_common::{internal_err, DataFusionError, Result};
+use std::any::Any;
use std::fmt;
use std::fmt::Debug;
use std::fmt::Formatter;
use std::sync::Arc;
+// TODO(PR): add doc comments
+pub trait ScalarFunctionDef: Any + Sync + Send + std::fmt::Debug {
Review Comment:
What do you think about calling this trait `ScalarFunction` rather than
`ScalarFunctionDef`? I know there are already several other things called
`ScalarFunction` but that would also keep it in line with things like
`WindowFunction`
https://docs.rs/datafusion/latest/datafusion/index.html?search=WindowFunction
##########
datafusion/expr/src/udf.rs:
##########
@@ -17,12 +17,67 @@
//! Udf module contains foundational types that are used to represent UDFs in
DataFusion.
-use crate::{Expr, ReturnTypeFunction, ScalarFunctionImplementation, Signature};
+use crate::{
+ ColumnarValue, Expr, FuncMonotonicity, ReturnTypeFunction,
+ ScalarFunctionImplementation, Signature, TypeSignature, Volatility,
+};
+use arrow::array::ArrayRef;
+use arrow::datatypes::DataType;
+use datafusion_common::{internal_err, DataFusionError, Result};
+use std::any::Any;
use std::fmt;
use std::fmt::Debug;
use std::fmt::Formatter;
use std::sync::Arc;
+// TODO(PR): add doc comments
+pub trait ScalarFunctionDef: Any + Sync + Send + std::fmt::Debug {
+ /// Return as [`Any`] so that it can be
+ /// downcast to a specific implementation.
+ fn as_any(&self) -> &dyn Any;
+
+ // May return 1 or more name as aliasing
+ fn name(&self) -> &[&str];
Review Comment:
I recommend `name(&self) ->&str` that returns one name, and then a second
API that returns optional aliases
```
/// returns any alias names this function is known by. Defaults to empty list
fn aliases(&self) -> &[&str] { &[] }
``
##########
datafusion/expr/src/expr.rs:
##########
@@ -150,6 +151,9 @@ pub enum Expr {
Sort(Sort),
/// Represents the call of a built-in scalar function with a set of
arguments.
ScalarFunction(ScalarFunction),
+ /// Represents the call of a built-in scalar function with a set of
arguments,
Review Comment:
What do you think about calling this `ScalarFunction2` or
`ScalarFunctionDyn` to try and describe the difference? It might be confusing
to see `ScalarFunction` and `ScalarFunctionDef`
##########
datafusion/expr/src/built_in_function.rs:
##########
@@ -1550,6 +1551,46 @@ impl FromStr for BuiltinScalarFunction {
}
}
+/// `ScalarFunctionDef` is the new interface for builtin scalar functions
+/// This is an adapter between the old and new interface, to use the new
interface
+/// for internal execution. Functions are planned to move into new interface
gradually
+/// The function body (`execute()` in `ScalarFunctionDef`) now are all defined
in
+/// `physical-expr` crate, so the new interface implementation are defined
separately
+/// in `BuiltinScalarFunctionWrapper`
+impl ScalarFunctionDef for BuiltinScalarFunction {
Review Comment:
I don't think we'll be able to `impl` this as long as
`BuiltInScalarFunction` is split across two crates.
##########
datafusion/expr/src/udf.rs:
##########
@@ -17,12 +17,67 @@
//! Udf module contains foundational types that are used to represent UDFs in
DataFusion.
-use crate::{Expr, ReturnTypeFunction, ScalarFunctionImplementation, Signature};
+use crate::{
+ ColumnarValue, Expr, FuncMonotonicity, ReturnTypeFunction,
+ ScalarFunctionImplementation, Signature, TypeSignature, Volatility,
+};
+use arrow::array::ArrayRef;
+use arrow::datatypes::DataType;
+use datafusion_common::{internal_err, DataFusionError, Result};
+use std::any::Any;
use std::fmt;
use std::fmt::Debug;
use std::fmt::Formatter;
use std::sync::Arc;
+// TODO(PR): add doc comments
+pub trait ScalarFunctionDef: Any + Sync + Send + std::fmt::Debug {
+ /// Return as [`Any`] so that it can be
+ /// downcast to a specific implementation.
+ fn as_any(&self) -> &dyn Any;
+
+ // May return 1 or more name as aliasing
+ fn name(&self) -> &[&str];
+
+ fn input_type(&self) -> TypeSignature;
+
+ fn return_type(&self) -> FunctionReturnType;
+
+ fn execute(&self, _args: &[ArrayRef]) -> Result<ArrayRef> {
Review Comment:
I think this should be
```suggestion
fn execute(&self, _args: &[ColumnarValue]) -> Result<ColumnarValue> {
```
##########
datafusion/expr/src/expr.rs:
##########
@@ -150,6 +151,9 @@ pub enum Expr {
Sort(Sort),
/// Represents the call of a built-in scalar function with a set of
arguments.
ScalarFunction(ScalarFunction),
+ /// Represents the call of a built-in scalar function with a set of
arguments,
Review Comment:
Rather than add a parallel implementation I would love to just change
```
#[derive(Clone, PartialEq, Eq, Hash, Debug)]
pub struct ScalarFunction {
/// The function
pub fun: built_in_function::BuiltinScalarFunction,
/// List of expressions to feed to the functions as arguments
pub args: Vec<Expr>,
}
```
to something like
```
#[derive(Clone, PartialEq, Eq, Hash, Debug)]
pub struct ScalarFunction {
/// The function
pub fun: Arc<dyn ScalarFunctionDef>,
/// List of expressions to feed to the functions as arguments
pub args: Vec<Expr>,
}
```
##########
datafusion/expr/src/udf.rs:
##########
@@ -17,12 +17,67 @@
//! Udf module contains foundational types that are used to represent UDFs in
DataFusion.
-use crate::{Expr, ReturnTypeFunction, ScalarFunctionImplementation, Signature};
+use crate::{
+ ColumnarValue, Expr, FuncMonotonicity, ReturnTypeFunction,
+ ScalarFunctionImplementation, Signature, TypeSignature, Volatility,
+};
+use arrow::array::ArrayRef;
+use arrow::datatypes::DataType;
+use datafusion_common::{internal_err, DataFusionError, Result};
+use std::any::Any;
use std::fmt;
use std::fmt::Debug;
use std::fmt::Formatter;
use std::sync::Arc;
+// TODO(PR): add doc comments
+pub trait ScalarFunctionDef: Any + Sync + Send + std::fmt::Debug {
+ /// Return as [`Any`] so that it can be
+ /// downcast to a specific implementation.
+ fn as_any(&self) -> &dyn Any;
+
+ // May return 1 or more name as aliasing
+ fn name(&self) -> &[&str];
+
+ fn input_type(&self) -> TypeSignature;
+
+ fn return_type(&self) -> FunctionReturnType;
+
+ fn execute(&self, _args: &[ArrayRef]) -> Result<ArrayRef> {
+ internal_err!("This method should be implemented if
`supports_execute_raw()` returns `false`")
+ }
+
+ fn volatility(&self) -> Volatility;
+
+ fn monotonicity(&self) -> Option<FuncMonotonicity>;
+
+ // ===============================
+ // OPTIONAL METHODS START BELOW
+ // ===============================
+
+ /// `execute()` and `execute_raw()` are two possible alternative for
function definition:
+ /// If returns `false`, `execute()` will be used for execution;
+ /// If returns `true`, `execute_raw()` will be called.
+ fn use_execute_raw_instead(&self) -> bool {
+ false
+ }
+
+ /// An alternative function defination than `execute()`
+ fn execute_raw(&self, _args: &[ColumnarValue]) -> Result<ColumnarValue> {
+ internal_err!("This method should be implemented if
`supports_execute_raw()` returns `true`")
+ }
+}
+
+/// Defines the return type behavior of a function.
+pub enum FunctionReturnType {
Review Comment:
What do you think about a signature like this:
```rust
pub trait ScalarFunctionDef: Any + Sync + Send + std::fmt::Debug {
...
/// What type will this function return, given arguments of the
specified input types?
/// By default, returns the same type as the first argument
fn return_type(&self, arg_types: &[DataType]) -> Result<DataType> {
arg_types.get(0)
.ok_or_else(Error ("Implementation of Function {} did not specify a
return type, and there are no arguments"))
}
...
}
```
Then I think most function implementations can be left as the default or as
```rust
impl ScalarFunctionDef for Foo {
fn return_type(&self, arg_types: &[DataType]) -> Result<DataType> {
Ok(DataType::Utf8)
}
}
```
##########
datafusion/expr/src/udf.rs:
##########
@@ -17,12 +17,67 @@
//! Udf module contains foundational types that are used to represent UDFs in
DataFusion.
-use crate::{Expr, ReturnTypeFunction, ScalarFunctionImplementation, Signature};
+use crate::{
+ ColumnarValue, Expr, FuncMonotonicity, ReturnTypeFunction,
+ ScalarFunctionImplementation, Signature, TypeSignature, Volatility,
+};
+use arrow::array::ArrayRef;
+use arrow::datatypes::DataType;
+use datafusion_common::{internal_err, DataFusionError, Result};
+use std::any::Any;
use std::fmt;
use std::fmt::Debug;
use std::fmt::Formatter;
use std::sync::Arc;
+// TODO(PR): add doc comments
+pub trait ScalarFunctionDef: Any + Sync + Send + std::fmt::Debug {
+ /// Return as [`Any`] so that it can be
+ /// downcast to a specific implementation.
+ fn as_any(&self) -> &dyn Any;
+
+ // May return 1 or more name as aliasing
+ fn name(&self) -> &[&str];
+
+ fn input_type(&self) -> TypeSignature;
+
+ fn return_type(&self) -> FunctionReturnType;
+
+ fn execute(&self, _args: &[ArrayRef]) -> Result<ArrayRef> {
+ internal_err!("This method should be implemented if
`supports_execute_raw()` returns `false`")
+ }
+
+ fn volatility(&self) -> Volatility;
+
+ fn monotonicity(&self) -> Option<FuncMonotonicity>;
+
+ // ===============================
+ // OPTIONAL METHODS START BELOW
+ // ===============================
+
+ /// `execute()` and `execute_raw()` are two possible alternative for
function definition:
+ /// If returns `false`, `execute()` will be used for execution;
+ /// If returns `true`, `execute_raw()` will be called.
+ fn use_execute_raw_instead(&self) -> bool {
Review Comment:
I think we should have a single `execute` method that takes `ColumnarValue`s
and put in the documentation how to go from `ColumnarValue` --> `ArrayRef` for
simple, initial implementation
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]