alamb commented on code in PR #9172:
URL: https://github.com/apache/arrow-datafusion/pull/9172#discussion_r1484789122


##########
docs/source/library-user-guide/adding-udfs.md:
##########
@@ -34,7 +34,74 @@ First we'll talk about adding an Scalar UDF end-to-end, then 
we'll talk about th
 
 ## Adding a Scalar UDF
 
-A Scalar UDF is a function that takes a row of data and returns a single 
value. For example, this function takes a single i64 and returns a single i64 
with 1 added to it:
+A Scalar UDF is a function that takes a row of data and returns a single 
value. In the example, we will add a function takes a single i64 and returns a 
single i64 with 1 added to it:
+
+For brevity, we'll skipped some error handling, but e.g. you may want to check 
that `args.len()` is the expected number of arguments.
+
+### Adding by `impl ScalarUDFImpl`
+
+This a lower level API with more functionality but is more complex, also 
documented in [`advanced_udf.rs`].
+
+```rust
+use std::any::Any;
+use arrow::datatypes::DataType;
+use datafusion_common::{DataFusionError, plan_err, Result};
+use datafusion_expr::{col, ColumnarValue, Signature, Volatility};
+use datafusion_expr::{ScalarUDFImpl, ScalarUDF};
+
+#[derive(Debug)]
+struct AddOne {
+    signature: Signature
+};
+
+impl AddOne {
+    fn new() -> Self {
+        Self {
+            signature: Signature::uniform(1, vec![DataType::Int32], 
Volatility::Immutable)
+        }
+    }
+}
+
+/// Implement the ScalarUDFImpl trait for AddOne
+impl ScalarUDFImpl for AddOne {
+    fn as_any(&self) -> &dyn Any { self }
+    fn name(&self) -> &str { "add_one" }
+    fn signature(&self) -> &Signature { &self.signature }
+    fn return_type(&self, args: &[DataType]) -> Result<DataType> {
+      if !matches!(args.get(0), Some(&DataType::Int32)) {
+        return plan_err!("add_one only accepts Int32 arguments");
+      }
+      Ok(DataType::Int32)
+    }
+    // The actual implementation would add one to the argument
+    fn invoke(&self, args: &[ColumnarValue]) -> Result<ColumnarValue> {
+        let args = columnar_values_to_array(args)?;
+        let i64s = as_int64_array(&args[0])?;
+
+        let new_array = i64s
+        .iter()
+        .map(|array_elem| array_elem.map(|value| value + 1))
+        .collect::<Int64Array>();
+        Ok(Arc::new(new_array))
+    }
+ }
+
+// Create a new ScalarUDF from the implementation
+let add_one = ScalarUDF::from(AddOne::new());
+
+// register the UDF with the context so it can be invoked by name and from SQL
+let mut ctx = SessionContext::new();
+ctx.register_udf(add_one.clone());
+
+// Call the function `add_one(col)`
+let expr = add_one.call(vec![col("a")]);
+```
+
+### Adding a Scalar UDF by [`create_udf`]
+
+[`create_udf`] is easy to use, but it still requires Arc's of anonymous 
functions, and its function is limited.

Review Comment:
   As you mention, it probably is reasonable not to spend too much time 
commenting on this method. How about this for wording
   
   ```suggestion
   ### Adding a Scalar UDF by [`create_udf`]
   
   There is a an older, more concise, but also more limited API [`create_udf`] 
available as well
   ```



##########
docs/source/library-user-guide/adding-udfs.md:
##########
@@ -34,7 +34,74 @@ First we'll talk about adding an Scalar UDF end-to-end, then 
we'll talk about th
 
 ## Adding a Scalar UDF
 
-A Scalar UDF is a function that takes a row of data and returns a single 
value. For example, this function takes a single i64 and returns a single i64 
with 1 added to it:
+A Scalar UDF is a function that takes a row of data and returns a single 
value. In the example, we will add a function takes a single i64 and returns a 
single i64 with 1 added to it:

Review Comment:
   ```suggestion
   A Scalar UDF is a function that takes a row of data and returns a single 
value. In order for good performance
   such functions are "vectorized" in DataFusion, meaning they get one or more 
Arrow Arrays as input and produce
   an Arrow Array with the same number of rows as output.
   
   To create a Scalar UDF, you
   1.  Implement the `ScalarUDFImpl` trait to tell DataFusion about your 
function such as what types of arguments it takes and how to calculate the 
results. 
   2. Create a `ScalarUDF` and register it with `SessionContext::register_udf` 
so it can be invoked by name.
   
   In the following example, we will add a function takes a single i64 and 
returns a single i64 with 1 added to it:
   ```



##########
docs/source/library-user-guide/adding-udfs.md:
##########
@@ -74,11 +139,10 @@ assert_eq!(result, &Int64Array::from(vec![Some(2), None, 
Some(4)]));
 
 The challenge however is that DataFusion doesn't know about this function. We 
need to register it with DataFusion so that it can be used in the context of a 
query.
 
-### Registering a Scalar UDF
+#### Registering a Scalar UDF
 
 To register a Scalar UDF, you need to wrap the function implementation in a 
[`ScalarUDF`] struct and then register it with the `SessionContext`.
 DataFusion provides the [`create_udf`] and helper functions to make this 
easier.
-There is a lower level API with more functionality but is more complex, that 
is documented in [`advanced_udf.rs`].

Review Comment:
   I think the value of this library guide is the surrounding text, so it seems 
ok to me to have the code replicated in two places



##########
docs/source/library-user-guide/adding-udfs.md:
##########
@@ -34,7 +34,74 @@ First we'll talk about adding an Scalar UDF end-to-end, then 
we'll talk about th
 
 ## Adding a Scalar UDF
 
-A Scalar UDF is a function that takes a row of data and returns a single 
value. For example, this function takes a single i64 and returns a single i64 
with 1 added to it:
+A Scalar UDF is a function that takes a row of data and returns a single 
value. In the example, we will add a function takes a single i64 and returns a 
single i64 with 1 added to it:
+
+For brevity, we'll skipped some error handling, but e.g. you may want to check 
that `args.len()` is the expected number of arguments.
+
+### Adding by `impl ScalarUDFImpl`
+
+This a lower level API with more functionality but is more complex, also 
documented in [`advanced_udf.rs`].
+
+```rust
+use std::any::Any;
+use arrow::datatypes::DataType;
+use datafusion_common::{DataFusionError, plan_err, Result};
+use datafusion_expr::{col, ColumnarValue, Signature, Volatility};
+use datafusion_expr::{ScalarUDFImpl, ScalarUDF};
+
+#[derive(Debug)]
+struct AddOne {
+    signature: Signature
+};
+
+impl AddOne {
+    fn new() -> Self {
+        Self {
+            signature: Signature::uniform(1, vec![DataType::Int32], 
Volatility::Immutable)
+        }
+    }
+}
+
+/// Implement the ScalarUDFImpl trait for AddOne
+impl ScalarUDFImpl for AddOne {
+    fn as_any(&self) -> &dyn Any { self }
+    fn name(&self) -> &str { "add_one" }
+    fn signature(&self) -> &Signature { &self.signature }
+    fn return_type(&self, args: &[DataType]) -> Result<DataType> {
+      if !matches!(args.get(0), Some(&DataType::Int32)) {
+        return plan_err!("add_one only accepts Int32 arguments");
+      }
+      Ok(DataType::Int32)
+    }
+    // The actual implementation would add one to the argument
+    fn invoke(&self, args: &[ColumnarValue]) -> Result<ColumnarValue> {
+        let args = columnar_values_to_array(args)?;
+        let i64s = as_int64_array(&args[0])?;
+
+        let new_array = i64s
+        .iter()
+        .map(|array_elem| array_elem.map(|value| value + 1))
+        .collect::<Int64Array>();
+        Ok(Arc::new(new_array))
+    }
+ }
+

Review Comment:
   I suggest ending the first code block here and then adding a sentence:
   
   > We now need to register the function with DataFusion so that it can be 
used in the context of a query.
   
   (then rest of the example)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to