alamb commented on code in PR #14094:
URL: https://github.com/apache/datafusion/pull/14094#discussion_r1920358041
##########
datafusion/expr/src/udf.rs:
##########
@@ -342,6 +348,56 @@ pub struct ScalarFunctionArgs<'a> {
pub return_type: &'a DataType,
}
+#[derive(Debug)]
+pub struct ReturnTypeArgs<'a> {
Review Comment:
Here are some suggestions to improve the comments:
```suggestion
/// Arguments passed to [`ScalarUDFImpl::return_type_from_args`]
#[derive(Debug)]
pub struct ReturnTypeArgs<'a> {
```
##########
datafusion/expr/src/udf.rs:
##########
@@ -342,6 +348,56 @@ pub struct ScalarFunctionArgs<'a> {
pub return_type: &'a DataType,
}
+#[derive(Debug)]
+pub struct ReturnTypeArgs<'a> {
+ /// The data types of the arguments to the function
+ pub arg_types: &'a [DataType],
+ /// The Utf8 arguments to the function, if the expression is not Utf8, it
will be empty string
Review Comment:
If possible I suggest using the following type:
1. It can distinguish between string/non-string arguments
3. It avoids clones
```rust
pub arguments: &'a [Option<&'a ScalarValue>]>`,
```
##########
datafusion/expr/src/udf.rs:
##########
@@ -490,6 +547,40 @@ pub trait ScalarUDFImpl: Debug + Send + Sync {
self.return_type(arg_types)
}
+ /// What [`DataType`] will be returned by this function, given the
Review Comment:
I also recommend removing the comments from `return_type_from_exprs` now
that it is deprecated and you have moved the content here
##########
datafusion/expr/src/udf.rs:
##########
@@ -481,6 +537,7 @@ pub trait ScalarUDFImpl: Debug + Send + Sync {
/// This function must consistently return the same type for the same
/// logical input even if the input is simplified (e.g. it must return the
same
/// value for `('foo' | 'bar')` as it does for ('foobar').
+ #[deprecated(since = "45.0.0", note = "Use `return_type_from_args`
instead")]
Review Comment:
Can you also please update the comments on `return_type` to refer to
`return_type_from_args` (it currently talks about `return_type_from_exprs`:
```rust
/// If you provide an implementation for
[`Self::return_type_from_exprs`],
/// DataFusion will not call `return_type` (this function). In this case
it
/// is recommended to return [`DataFusionError::Internal`].
```
##########
datafusion/functions/src/core/named_struct.rs:
##########
@@ -44,11 +44,18 @@ fn named_struct_expr(args: &[ColumnarValue]) ->
Result<ColumnarValue> {
.chunks_exact(2)
.enumerate()
.map(|(i, chunk)| {
-
let name_column = &chunk[0];
let name = match name_column {
- ColumnarValue::Scalar(ScalarValue::Utf8(Some(name_scalar))) =>
name_scalar,
- _ => return exec_err!("named_struct even arguments must be
string literals, got {name_column:?} instead at position {}", i * 2)
+ ColumnarValue::Scalar(ScalarValue::Utf8(Some(name_scalar))) =>
{
+ name_scalar
+ }
+ // TODO: Implement Display for ColumnarValue
Review Comment:
That is a good todo -- maybe we can make a follow on ticket for it
##########
datafusion/functions/src/core/coalesce.rs:
##########
@@ -174,39 +177,3 @@ fn get_coalesce_doc() -> &'static Documentation {
.build()
})
}
-
-#[cfg(test)]
Review Comment:
why remove these tests?
##########
datafusion/physical-expr/src/scalar_function.rs:
##########
@@ -83,6 +86,53 @@ impl ScalarFunctionExpr {
}
}
+ /// Create a new Scalar function
+ pub fn try_new(
+ fun: Arc<ScalarUDF>,
+ args: Vec<Arc<dyn PhysicalExpr>>,
+ schema: &Schema,
+ ) -> Result<Self> {
+ let name = fun.name().to_string();
+ let arg_types = args
+ .iter()
+ .map(|e| e.data_type(schema))
+ .collect::<Result<Vec<_>>>()?;
+
+ // verify that input data types is consistent with function's
`TypeSignature`
+ data_types_with_scalar_udf(&arg_types, &fun)?;
+
+ let nullables = args
+ .iter()
+ .map(|e| e.nullable(schema))
+ .collect::<Result<Vec<_>>>()?;
+
+ let arguments: Vec<String> = args
+ .iter()
+ .map(|e| {
+ e.as_any()
+ .downcast_ref::<Literal>()
+ .map(|literal| match literal.value() {
+ ScalarValue::Utf8(Some(s)) => s.clone(),
Review Comment:
It would be really nice to avoid this clone here -- if we changed the
signature to `&[Option<&str>]` I think that would be easier to use and more
efficient
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]