alamb commented on issue #8568: URL: https://github.com/apache/arrow-datafusion/issues/8568#issuecomment-1867610161
> Ideally, i'd prefer something like this for implementing scalar udfs. @universalmind303 -- perhaps you could check out https://github.com/apache/arrow-datafusion/pull/8578 which proposes the following trait for implementing UDFs: ```rust pub trait ScalarUDFImpl { /// Returns this object as an [`Any`] trait object fn as_any(&self) -> &dyn Any; /// Returns this function's name fn name(&self) -> &str; /// Returns the function's [`Signature`] for information about what input /// types are accepted and the function's Volatility. fn signature(&self) -> &Signature; /// What [`DataType`] will be returned by this function, given the types of /// the arguments fn return_type(&self, arg_types: &[DataType]) -> Result<DataType>; /// Invoke the function on `args`, returning the appropriate result /// /// The function will be invoked passed with the slice of [`ColumnarValue`] /// (either scalar or array). /// /// # Zero Argument Functions /// If the function has zero parameters (e.g. `now()`) it will be passed a /// single element slice which is a a null array to indicate the batch's row /// count (so the function can know the resulting array size). /// /// # Performance /// /// For the best performance, the implementations of `invoke` should handle /// the common case when one or more of their arguments are constant values /// (aka [`ColumnarValue::Scalar`]). Calling [`ColumnarValue::into_array`] /// and treating all arguments as arrays will work, but will be slower. fn invoke(&self, args: &[ColumnarValue]) -> Result<ColumnarValue>; /// Returns any aliases (alternate names) for this function. /// /// Aliases can be used to invoke the same function using different names. /// For example in some databases `now()` and `current_timestamp()` are /// aliases for the same function. This behavior can be obtained by /// returning `current_timestamp` as an alias for the `now` function. /// /// Note: `aliases` should only include names other than [`Self::name`]. /// Defaults to `[]` (no aliases) fn aliases(&self) -> &[String] { &[] } } ``` > I'm also curious. > > Are there plans to ever allow udfs to support return types based off the input **value** instead of the **type**? I know there is currently special handling for `arrow_cast` because of this limitation. We discussed it in https://github.com/apache/arrow-datafusion/discussions/7657 and I just filed https://github.com/apache/arrow-datafusion/issues/8624 Once we have a trait based API I believe we will be in a much better position to support this feature -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
