alamb commented on code in PR #7978:
URL: https://github.com/apache/arrow-datafusion/pull/7978#discussion_r1378086115


##########
datafusion/expr/src/udf.rs:
##########
@@ -17,12 +17,67 @@
 
 //! Udf module contains foundational types that are used to represent UDFs in 
DataFusion.
 
-use crate::{Expr, ReturnTypeFunction, ScalarFunctionImplementation, Signature};
+use crate::{
+    ColumnarValue, Expr, FuncMonotonicity, ReturnTypeFunction,
+    ScalarFunctionImplementation, Signature, TypeSignature, Volatility,
+};
+use arrow::array::ArrayRef;
+use arrow::datatypes::DataType;
+use datafusion_common::{internal_err, DataFusionError, Result};
+use std::any::Any;
 use std::fmt;
 use std::fmt::Debug;
 use std::fmt::Formatter;
 use std::sync::Arc;
 
+// TODO(PR): add doc comments
+pub trait ScalarFunctionDef: Any + Sync + Send + std::fmt::Debug {

Review Comment:
   What do you think about calling this trait `ScalarFunction` rather than 
`ScalarFunctionDef`? I know there are already several other things called 
`ScalarFunction` but that would also keep it in line with things like 
`WindowFunction` 
https://docs.rs/datafusion/latest/datafusion/index.html?search=WindowFunction 



##########
datafusion/expr/src/udf.rs:
##########
@@ -17,12 +17,67 @@
 
 //! Udf module contains foundational types that are used to represent UDFs in 
DataFusion.
 
-use crate::{Expr, ReturnTypeFunction, ScalarFunctionImplementation, Signature};
+use crate::{
+    ColumnarValue, Expr, FuncMonotonicity, ReturnTypeFunction,
+    ScalarFunctionImplementation, Signature, TypeSignature, Volatility,
+};
+use arrow::array::ArrayRef;
+use arrow::datatypes::DataType;
+use datafusion_common::{internal_err, DataFusionError, Result};
+use std::any::Any;
 use std::fmt;
 use std::fmt::Debug;
 use std::fmt::Formatter;
 use std::sync::Arc;
 
+// TODO(PR): add doc comments
+pub trait ScalarFunctionDef: Any + Sync + Send + std::fmt::Debug {
+    /// Return as [`Any`] so that it can be
+    /// downcast to a specific implementation.
+    fn as_any(&self) -> &dyn Any;
+
+    // May return 1 or more name as aliasing
+    fn name(&self) -> &[&str];

Review Comment:
   I recommend `name(&self) ->&str` that returns one name, and then a second 
API that returns optional aliases
   
   ```
   /// returns any alias names this function is known by. Defaults to empty list
   fn aliases(&self) -> &[&str] { &[] }
   ``



##########
datafusion/expr/src/expr.rs:
##########
@@ -150,6 +151,9 @@ pub enum Expr {
     Sort(Sort),
     /// Represents the call of a built-in scalar function with a set of 
arguments.
     ScalarFunction(ScalarFunction),
+    /// Represents the call of a built-in scalar function with a set of 
arguments,

Review Comment:
   What do you think about calling this `ScalarFunction2` or 
`ScalarFunctionDyn` to try and describe the difference? It might be confusing 
to see `ScalarFunction` and `ScalarFunctionDef`
   



##########
datafusion/expr/src/built_in_function.rs:
##########
@@ -1550,6 +1551,46 @@ impl FromStr for BuiltinScalarFunction {
     }
 }
 
+/// `ScalarFunctionDef` is the new interface for builtin scalar functions
+/// This is an adapter between the old and new interface, to use the new 
interface
+/// for internal execution. Functions are planned to move into new interface 
gradually
+/// The function body (`execute()` in `ScalarFunctionDef`) now are all defined 
in
+/// `physical-expr` crate, so the new interface implementation are defined 
separately
+/// in `BuiltinScalarFunctionWrapper`
+impl ScalarFunctionDef for BuiltinScalarFunction {

Review Comment:
   I don't think we'll be able to `impl` this as long as 
`BuiltInScalarFunction` is split across two crates. 
   
   



##########
datafusion/expr/src/udf.rs:
##########
@@ -17,12 +17,67 @@
 
 //! Udf module contains foundational types that are used to represent UDFs in 
DataFusion.
 
-use crate::{Expr, ReturnTypeFunction, ScalarFunctionImplementation, Signature};
+use crate::{
+    ColumnarValue, Expr, FuncMonotonicity, ReturnTypeFunction,
+    ScalarFunctionImplementation, Signature, TypeSignature, Volatility,
+};
+use arrow::array::ArrayRef;
+use arrow::datatypes::DataType;
+use datafusion_common::{internal_err, DataFusionError, Result};
+use std::any::Any;
 use std::fmt;
 use std::fmt::Debug;
 use std::fmt::Formatter;
 use std::sync::Arc;
 
+// TODO(PR): add doc comments
+pub trait ScalarFunctionDef: Any + Sync + Send + std::fmt::Debug {
+    /// Return as [`Any`] so that it can be
+    /// downcast to a specific implementation.
+    fn as_any(&self) -> &dyn Any;
+
+    // May return 1 or more name as aliasing
+    fn name(&self) -> &[&str];
+
+    fn input_type(&self) -> TypeSignature;
+
+    fn return_type(&self) -> FunctionReturnType;
+
+    fn execute(&self, _args: &[ArrayRef]) -> Result<ArrayRef> {

Review Comment:
   I think this should be
   
   ```suggestion
       fn execute(&self, _args: &[ColumnarValue]) -> Result<ColumnarValue> {
   ```



##########
datafusion/expr/src/expr.rs:
##########
@@ -150,6 +151,9 @@ pub enum Expr {
     Sort(Sort),
     /// Represents the call of a built-in scalar function with a set of 
arguments.
     ScalarFunction(ScalarFunction),
+    /// Represents the call of a built-in scalar function with a set of 
arguments,

Review Comment:
   Rather than add a parallel implementation I would love to just change
   
   ```
   #[derive(Clone, PartialEq, Eq, Hash, Debug)]
   pub struct ScalarFunction {
       /// The function
       pub fun: built_in_function::BuiltinScalarFunction,
       /// List of expressions to feed to the functions as arguments
       pub args: Vec<Expr>,
   }
   ```
   to something like 
   ```
   #[derive(Clone, PartialEq, Eq, Hash, Debug)]
   pub struct ScalarFunction {
       /// The function
       pub fun: Arc<dyn ScalarFunctionDef>,
       /// List of expressions to feed to the functions as arguments
       pub args: Vec<Expr>,
   }
   ```
   
   
   
   



##########
datafusion/expr/src/udf.rs:
##########
@@ -17,12 +17,67 @@
 
 //! Udf module contains foundational types that are used to represent UDFs in 
DataFusion.
 
-use crate::{Expr, ReturnTypeFunction, ScalarFunctionImplementation, Signature};
+use crate::{
+    ColumnarValue, Expr, FuncMonotonicity, ReturnTypeFunction,
+    ScalarFunctionImplementation, Signature, TypeSignature, Volatility,
+};
+use arrow::array::ArrayRef;
+use arrow::datatypes::DataType;
+use datafusion_common::{internal_err, DataFusionError, Result};
+use std::any::Any;
 use std::fmt;
 use std::fmt::Debug;
 use std::fmt::Formatter;
 use std::sync::Arc;
 
+// TODO(PR): add doc comments
+pub trait ScalarFunctionDef: Any + Sync + Send + std::fmt::Debug {
+    /// Return as [`Any`] so that it can be
+    /// downcast to a specific implementation.
+    fn as_any(&self) -> &dyn Any;
+
+    // May return 1 or more name as aliasing
+    fn name(&self) -> &[&str];
+
+    fn input_type(&self) -> TypeSignature;
+
+    fn return_type(&self) -> FunctionReturnType;
+
+    fn execute(&self, _args: &[ArrayRef]) -> Result<ArrayRef> {
+        internal_err!("This method should be implemented if 
`supports_execute_raw()` returns `false`")
+    }
+
+    fn volatility(&self) -> Volatility;
+
+    fn monotonicity(&self) -> Option<FuncMonotonicity>;
+
+    // ===============================
+    // OPTIONAL METHODS START BELOW
+    // ===============================
+
+    /// `execute()` and `execute_raw()` are two possible alternative for 
function definition:
+    /// If returns `false`, `execute()` will be used for execution;
+    /// If returns `true`, `execute_raw()` will be called.
+    fn use_execute_raw_instead(&self) -> bool {
+        false
+    }
+
+    /// An alternative function defination than `execute()`
+    fn execute_raw(&self, _args: &[ColumnarValue]) -> Result<ColumnarValue> {
+        internal_err!("This method should be implemented if 
`supports_execute_raw()` returns `true`")
+    }
+}
+
+/// Defines the return type behavior of a function.
+pub enum FunctionReturnType {

Review Comment:
   What do you think about a signature like this:
   
   ```rust
   pub trait ScalarFunctionDef: Any + Sync + Send + std::fmt::Debug {
   ...
       /// What type will this function return, given arguments of the 
specified input types?
       /// By default, returns the same type as the first argument
       fn return_type(&self, arg_types: &[DataType]) -> Result<DataType> {
         arg_types.get(0)
           .ok_or_else(Error ("Implementation of Function {} did not specify a 
return type, and there are no arguments"))
     }
   ...
   }
   ```
   
   Then I think most function implementations can be left as the default or as
   
   ```rust
   impl ScalarFunctionDef for Foo {
       fn return_type(&self, arg_types: &[DataType]) -> Result<DataType> {
         Ok(DataType::Utf8)
       }
   }
   ```



##########
datafusion/expr/src/udf.rs:
##########
@@ -17,12 +17,67 @@
 
 //! Udf module contains foundational types that are used to represent UDFs in 
DataFusion.
 
-use crate::{Expr, ReturnTypeFunction, ScalarFunctionImplementation, Signature};
+use crate::{
+    ColumnarValue, Expr, FuncMonotonicity, ReturnTypeFunction,
+    ScalarFunctionImplementation, Signature, TypeSignature, Volatility,
+};
+use arrow::array::ArrayRef;
+use arrow::datatypes::DataType;
+use datafusion_common::{internal_err, DataFusionError, Result};
+use std::any::Any;
 use std::fmt;
 use std::fmt::Debug;
 use std::fmt::Formatter;
 use std::sync::Arc;
 
+// TODO(PR): add doc comments
+pub trait ScalarFunctionDef: Any + Sync + Send + std::fmt::Debug {
+    /// Return as [`Any`] so that it can be
+    /// downcast to a specific implementation.
+    fn as_any(&self) -> &dyn Any;
+
+    // May return 1 or more name as aliasing
+    fn name(&self) -> &[&str];
+
+    fn input_type(&self) -> TypeSignature;
+
+    fn return_type(&self) -> FunctionReturnType;
+
+    fn execute(&self, _args: &[ArrayRef]) -> Result<ArrayRef> {
+        internal_err!("This method should be implemented if 
`supports_execute_raw()` returns `false`")
+    }
+
+    fn volatility(&self) -> Volatility;
+
+    fn monotonicity(&self) -> Option<FuncMonotonicity>;
+
+    // ===============================
+    // OPTIONAL METHODS START BELOW
+    // ===============================
+
+    /// `execute()` and `execute_raw()` are two possible alternative for 
function definition:
+    /// If returns `false`, `execute()` will be used for execution;
+    /// If returns `true`, `execute_raw()` will be called.
+    fn use_execute_raw_instead(&self) -> bool {

Review Comment:
   I think we should have a single `execute` method that takes `ColumnarValue`s 
and put in the documentation how to go from `ColumnarValue` --> `ArrayRef` for 
simple, initial implementation



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to