alamb commented on code in PR #8578:
URL: https://github.com/apache/arrow-datafusion/pull/8578#discussion_r1430517371


##########
datafusion-examples/examples/advanced_udf.rs:
##########
@@ -0,0 +1,243 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+use datafusion::{
+    arrow::{
+        array::{ArrayRef, Float32Array, Float64Array},
+        datatypes::DataType,
+        record_batch::RecordBatch,
+    },
+    logical_expr::Volatility,
+};
+use std::any::Any;
+
+use arrow::array::{new_null_array, Array, AsArray};
+use arrow::compute;
+use arrow::datatypes::Float64Type;
+use datafusion::error::Result;
+use datafusion::prelude::*;
+use datafusion_common::{internal_err, ScalarValue};
+use datafusion_expr::{ColumnarValue, ScalarUDF, ScalarUDFImpl, Signature};
+use std::sync::Arc;
+
+/// This example shows how to use the full ScalarUDFImpl API to implement a 
user

Review Comment:
   I wanted to create an example that shows how to make a more advanced UDF 
that special cases constant values.
   
   This also shows how to create a ScalarUDF using a trait (rather than free 
functions and closures) 



##########
datafusion/expr/src/udf.rs:
##########
@@ -124,22 +169,116 @@ impl ScalarUDF {
         &self.aliases
     }
 
-    /// Returns this function's signature (what input types are accepted)
+    /// Returns this function's [`Signature`] (what input types are accepted)
     pub fn signature(&self) -> &Signature {
         &self.signature
     }
 
-    /// Return the type of the function given its input types
+    /// The datatype this function returns given the input argument input types
     pub fn return_type(&self, args: &[DataType]) -> Result<DataType> {
         // Old API returns an Arc of the datatype for some reason
         let res = (self.return_type)(args)?;
         Ok(res.as_ref().clone())
     }
 
-    /// Return the actual implementation
+    /// Return an [`Arc`] to the function implementation
     pub fn fun(&self) -> ScalarFunctionImplementation {
         self.fun.clone()
     }
+}
+
+impl<F> From<F> for ScalarUDF
+where
+    F: ScalarUDFImpl + Send + Sync + 'static,
+{
+    fn from(fun: F) -> Self {
+        Self::new_from_trait(fun)
+    }
+}
+
+/// Trait for implementing [`ScalarUDF`].

Review Comment:
   Here is the proposed new trait. I think we can use this trait to add things 
such as  "pre-compiling" arguments 
https://github.com/apache/arrow-datafusion/issues/8051 and adding better 
examples / documentation add examples and description to scalar/aggregate 
functions  #8366.
   
   cc @universalmind303 for your comments



##########
datafusion/expr/src/udf.rs:
##########
@@ -79,7 +90,11 @@ impl std::hash::Hash for ScalarUDF {
 }
 
 impl ScalarUDF {
-    /// Create a new ScalarUDF
+    /// Create a new ScalarUDF from low level details.
+    ///
+    /// See  [`ScalarUDFImpl`] for a more convenient way to create a
+    /// `ScalarUDF` using trait objects
+    #[deprecated(since = "34.0.0", note = "please implement ScalarUDFImpl 
instead")]

Review Comment:
   I think this low level API is quite akward to use and very hard to extend in 
backwards compatible ways. The trait is easer to use and easier to extend. 
   
   Thus I propose marking this API as deprecated (note most of the examples in 
codebase use `create_udf` rather than `ScalarUDF:new()` directly



##########
datafusion/expr/src/udf.rs:
##########
@@ -95,6 +110,34 @@ impl ScalarUDF {
         }
     }
 
+    /// Create a new `ScalarUDF` from a `[ScalarUDFImpl]` trait object
+    ///
+    /// Note this is the same as using the `From` impl (`ScalarUDF::from`)
+    pub fn new_from_trait<F>(fun: F) -> ScalarUDF
+    where
+        F: ScalarUDFImpl + Send + Sync + 'static,
+    {
+        // TODO change the internal implementation to use the trait object

Review Comment:
   I plan to improve the internal representation as a follow on PR



##########
datafusion/expr/src/expr.rs:
##########
@@ -1848,24 +1848,41 @@ mod test {
         );
 
         // UDF
-        let return_type: ReturnTypeFunction =
-            Arc::new(move |_| Ok(Arc::new(DataType::Utf8)));
-        let fun: ScalarFunctionImplementation =
-            Arc::new(move |_| 
Ok(ColumnarValue::Scalar(ScalarValue::new_utf8("a"))));
-        let udf = Arc::new(ScalarUDF::new(
-            "TestScalarUDF",
-            &Signature::uniform(1, vec![DataType::Float32], 
Volatility::Stable),
-            &return_type,
-            &fun,
-        ));
+        struct TestScalarUDF {

Review Comment:
   This shows an example of the difference in trait based vs low level 
`ScalarValue::new` API that I propose to deprecate
   
   While the trait requires more lines, I think it is much easier to implement 
as it is simply a standard trait implementation which I believe is far more 
common than Arc'd closures



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to