alamb commented on pull request #7967: URL: https://github.com/apache/arrow/pull/7967#issuecomment-683411727
> With this said, as an exercise, let me try to write how I imagine an interface could look like for option 3, just to check if I have the same understanding as you do. I think I had a slightly different idea. Here is one idea for an interface for defining UDFs that I think covers all the cases you have in mind (though it doesn't talk about implementation at all): ## UDF Registration: ``` trait UDF { // return the name that the user refers to invoke this function fn name(&self) -> &str; // Return desired argument types. // If desired type is "None" then no type coercion is done and any number of arguments // are accepted during logical planning. // if desired type is a slice, the logical planner will require the function is called with exactly that number // of arguments and will attempt to coerce arguments into these types. If any type is `None` then no coercion // will be done on that argument fn desired_argument_types(&self) -> Option<&[<Option<DataType>>]> // given the specified argument types, returns true if this function can fn valid_argument_types(arg_types: &[DataType]) -> bool // create the appropriate PhysicalExpression fn make_physical_expr(&self, arg_types: &[DataType]) -> Box<dyn PhysicalExpr>; } ``` Here is an sketch of registering sqrt with both 32 and 64 variants: ``` struct sqrt_32 {} impl UDF for sqrt_32 { fn name(&self) { "sqrt"} fn desired_argument_types(&self) { [Float32] } fn valid_argument_types(arg_types: &[DataType]) { arg_types == [Float32] } fn make_physical_expr(&self, arg_types: &[DataType]) {...} } struct sqrt_64 {} impl UDF for sqrt_64 { fn name(&self) { "sqrt"} fn desired_argument_types(&self) { [Float64] } fn valid_argument_types(arg_types: &[DataType]) { arg_types == [Float64] } fn make_physical_expr(&self, arg_types: &[DataType]) {...} } ``` The user would write `"sqrt(c)" `and then the type coercion logic would change that to `sqrt_64(cast c as Float64)` or perhaps `sqrt_32(c)` (if c was float 32). And you can imagine the type coercion logic hitting a `sqrt` function, and then trying to coerce arguments to Float32 first to match the first, and if that wasn't possible, try to coerce to Float64 Here is an example of "concat" that can take two exactly two arguments of the same type ``` struct concat {} impl UDF for concat { fn name(&self) { "concat"} fn desired_argument_types(&self) { [None, None] } fn valid_argument_types(arg_types: &[DataType]) { arg_types.len() == 2 && arg_types[0] == arg_types[1] } fn make_physical_expr(&self, arg_types: &[DataType]) {...} } ``` Here is an example of a `array` ``` struct array {} impl UDF for array { fn name(&self) { "array"} fn desired_argument_types(&self) { None } fn valid_argument_types(arg_types: &[DataType]) { ... custom logic to make sure all types are the same here ... } fn make_physical_expr(&self, arg_types: &[DataType]) {...} } ``` ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org