alamb commented on code in PR #7752:
URL: https://github.com/apache/arrow-datafusion/pull/7752#discussion_r1349511515


##########
datafusion-examples/Cargo.toml:
##########
@@ -41,6 +41,7 @@ datafusion-common = { path = "../datafusion/common" }
 datafusion-expr = { path = "../datafusion/expr" }
 datafusion-optimizer = { path = "../datafusion/optimizer" }
 datafusion-sql = { path = "../datafusion/sql" }
+datafusion-extension-test-scalar-func = {path = 
"../extension/scalar-function/test-func"}

Review Comment:
   Since most of DataFusion is extensions, another potential might be  a path 
structure like the following? 
   
   ```
   ├── core #  (already exists)
   ├── functions-aggregate-core
   ├── functions-aggregate-statistics
   ├── functions-scalar-array
   ├── functions-scalar-core
   ├── functions-scalar-crypto
   ├── functions-scalar-timestamp
   ├── physical-expr #  (already exists)
   ...
   └── physical-plan #  (already exists)
   ```



##########
datafusion/expr/src/udf.rs:
##########
@@ -18,11 +18,30 @@
 //! Udf module contains foundational types that are used to represent UDFs in 
DataFusion.
 
 use crate::{Expr, ReturnTypeFunction, ScalarFunctionImplementation, Signature};
+use arrow::array::ArrayRef;
+use datafusion_common::Result;
 use std::fmt;
 use std::fmt::Debug;
 use std::fmt::Formatter;
 use std::sync::Arc;
 
+pub trait ScalarFunctionDef: Sync + Send + std::fmt::Debug {
+    // TODO: support alias
+    fn name(&self) -> &str;
+
+    fn signature(&self) -> Signature;
+
+    // TODO: ReturnTypeFunction -> a ENUM
+    //     most function's return type is either the same as 1st arg or a 
fixed type
+    fn return_type(&self) -> ReturnTypeFunction;

Review Comment:
   I think the current API requires return_type to be passed the specific 
argument types
   
   Also, https://github.com/apache/arrow-datafusion/discussions/7657 suggests 
there is additional room for improvement



##########
datafusion-examples/Cargo.toml:
##########
@@ -41,6 +41,7 @@ datafusion-common = { path = "../datafusion/common" }
 datafusion-expr = { path = "../datafusion/expr" }
 datafusion-optimizer = { path = "../datafusion/optimizer" }
 datafusion-sql = { path = "../datafusion/sql" }
+datafusion-extension-test-scalar-func = {path = 
"../extension/scalar-function/test-func"}

Review Comment:
   Since most of DataFusion is extensions, another potential might be  a path 
structure like the following? 
   
   ```
   ├── core #  (already exists)
   ├── functions-aggregate-core
   ├── functions-aggregate-statistics
   ├── functions-scalar-array
   ├── functions-scalar-core
   ├── functions-scalar-crypto
   ├── functions-scalar-timestamp
   ├── physical-expr #  (already exists)
   ...
   └── physical-plan #  (already exists)
   ```



##########
datafusion/expr/src/udf.rs:
##########
@@ -18,11 +18,30 @@
 //! Udf module contains foundational types that are used to represent UDFs in 
DataFusion.
 
 use crate::{Expr, ReturnTypeFunction, ScalarFunctionImplementation, Signature};
+use arrow::array::ArrayRef;
+use datafusion_common::Result;
 use std::fmt;
 use std::fmt::Debug;
 use std::fmt::Formatter;
 use std::sync::Arc;
 
+pub trait ScalarFunctionDef: Sync + Send + std::fmt::Debug {

Review Comment:
   I really like using a trait to define functions. To be honest, I am not sure 
why DataFusion didn't so that originally. 
   
   My only concern is that adding this we would now have *three* ways to define 
scalar functions (the ScalarUDF, `ScalarFunctionDef` and 
`BuiltInScalarFunction`). 
   
   I wonder if this trait is needed, or can we extend `ScalarUDF` to account 
for aliases?
   
   Or perhaps should we be aiming to consolidate *all* functions to use 
`ScalarFuntionDef` 🤔 



##########
datafusion/core/src/execution/context.rs:
##########
@@ -792,6 +794,30 @@ impl SessionContext {
             .add_var_provider(variable_type, provider);
     }
 
+    /// Register a function package into this context
+    pub fn register_scalar_function_package(

Review Comment:
   If we are going to implement a new API, maybe we can have it cover all types 
of functions (window, aggregate, scalar) so there is a place to add table 
functions eventually too:
   
   Something like
   
   ```
       pub fn register_function_package(&self, func_pkg: Box<dyn 
FunctionPackage>)
   ```
   
   Also I wonder if it should take `Box<dyn FunctionPackage>` (which is owned) 
or `Arc<dyn FunctionPackage>` which can be shared. Looking at this code, it 
seems like there is no reason for an owned pointer and we could get away with 
`Arc` (so function packages can be quickly registered with multiple 
`SessionContext`s)



##########
datafusion/expr/src/udf.rs:
##########
@@ -18,11 +18,30 @@
 //! Udf module contains foundational types that are used to represent UDFs in 
DataFusion.
 
 use crate::{Expr, ReturnTypeFunction, ScalarFunctionImplementation, Signature};
+use arrow::array::ArrayRef;
+use datafusion_common::Result;
 use std::fmt;
 use std::fmt::Debug;
 use std::fmt::Formatter;
 use std::sync::Arc;
 
+pub trait ScalarFunctionDef: Sync + Send + std::fmt::Debug {
+    // TODO: support alias
+    fn name(&self) -> &str;
+
+    fn signature(&self) -> Signature;
+
+    // TODO: ReturnTypeFunction -> a ENUM
+    //     most function's return type is either the same as 1st arg or a 
fixed type
+    fn return_type(&self) -> ReturnTypeFunction;
+
+    fn execute(&self, args: &[ArrayRef]) -> Result<ArrayRef>;
+}
+
+pub trait ScalarFunctionPackage {
+    fn functions(&self) -> Vec<Box<dyn ScalarFunctionDef>>;

Review Comment:
   I suggest using `Arc` here as I don't see any reason for the to be state / 
need an owned copy



##########
datafusion/expr/src/udf.rs:
##########
@@ -18,11 +18,30 @@
 //! Udf module contains foundational types that are used to represent UDFs in 
DataFusion.
 
 use crate::{Expr, ReturnTypeFunction, ScalarFunctionImplementation, Signature};
+use arrow::array::ArrayRef;
+use datafusion_common::Result;
 use std::fmt;
 use std::fmt::Debug;
 use std::fmt::Formatter;
 use std::sync::Arc;
 
+pub trait ScalarFunctionDef: Sync + Send + std::fmt::Debug {

Review Comment:
   I really like using a trait to define functions. To be honest, I am not sure 
why DataFusion didn't so that originally. 
   
   My only concern is that adding this we would now have *three* ways to define 
scalar functions (the ScalarUDF, `ScalarFunctionDef` and 
`BuiltInScalarFunction`). 
   
   I wonder if this trait is needed, or can we extend `ScalarUDF` to account 
for aliases?
   
   Or perhaps should we be aiming to consolidate *all* functions to use 
`ScalarFuntionDef` 🤔 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to