alamb commented on issue #8568:
URL: 
https://github.com/apache/arrow-datafusion/issues/8568#issuecomment-1867610161

   > Ideally, i'd prefer something like this for implementing scalar udfs.
   
   @universalmind303  -- perhaps you could check out 
https://github.com/apache/arrow-datafusion/pull/8578 which proposes the 
following trait for implementing UDFs:
   
   ```rust
   pub trait ScalarUDFImpl {
       /// Returns this object as an [`Any`] trait object
       fn as_any(&self) -> &dyn Any;
   
       /// Returns this function's name
       fn name(&self) -> &str;
   
       /// Returns the function's [`Signature`] for information about what input
       /// types are accepted and the function's Volatility.
       fn signature(&self) -> &Signature;
   
       /// What [`DataType`] will be returned by this function, given the types 
of
       /// the arguments
       fn return_type(&self, arg_types: &[DataType]) -> Result<DataType>;
   
       /// Invoke the function on `args`, returning the appropriate result
       ///
       /// The function will be invoked passed with the slice of 
[`ColumnarValue`]
       /// (either scalar or array).
       ///
       /// # Zero Argument Functions
       /// If the function has zero parameters (e.g. `now()`) it will be passed 
a
       /// single element slice which is a a null array to indicate the batch's 
row
       /// count (so the function can know the resulting array size).
       ///
       /// # Performance
       ///
       /// For the best performance, the implementations of `invoke` should 
handle
       /// the common case when one or more of their arguments are constant 
values
       /// (aka  [`ColumnarValue::Scalar`]). Calling 
[`ColumnarValue::into_array`]
       /// and treating all arguments as arrays will work, but will be slower.
       fn invoke(&self, args: &[ColumnarValue]) -> Result<ColumnarValue>;
   
       /// Returns any aliases (alternate names) for this function.
       ///
       /// Aliases can be used to invoke the same function using different 
names.
       /// For example in some databases `now()` and `current_timestamp()` are
       /// aliases for the same function. This behavior can be obtained by
       /// returning `current_timestamp` as an alias for the `now` function.
       ///
       /// Note: `aliases` should only include names other than [`Self::name`].
       /// Defaults to `[]` (no aliases)
       fn aliases(&self) -> &[String] {
           &[]
       }
   }
   ```
   
   > I'm also curious.
   > 
   > Are there plans to ever allow udfs to support return types based off the 
input **value** instead of the **type**? I know there is currently special 
handling for `arrow_cast` because of this limitation.
   
   We discussed it in 
https://github.com/apache/arrow-datafusion/discussions/7657 and I just filed 
https://github.com/apache/arrow-datafusion/issues/8624
   
   Once we have a trait based API I believe we will be in a much better 
position to support this feature


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to