thinkharderdev commented on issue #8045: URL: https://github.com/apache/arrow-datafusion/issues/8045#issuecomment-1793267211
> > Would like to add that supporting serialization of user-defined functions would be quite nice. > > I don't understand this question @thinkharderdev 🤔 > > The current approach to serialize a `ScalarUDF` as I understand it, is to send over the function name along with the arguments > > https://github.com/apache/arrow-datafusion/blob/c2e768052c43e4bab6705ee76befc19de383c2cb/datafusion/proto/proto/datafusion.proto#L686-L689 > > And then the physical plan deserialization looks that function up by name https://docs.rs/datafusion-proto/latest/src/datafusion_proto/physical_plan/from_proto.rs.html#318-333 > > The only difference compared to `BuiltInScalarFunction` is that `BuiltInScalarFunction` uses a number (in an enum) which might be slightly faster Sorry, what I mean is that it would be useful to be able to serialize constant parameters into the user-defined scalar function themselves rather than pass them in as expressions. So for instance if you had to create a UDF to do something with a regex that you have as a static constant. Currently the way to do that is pass it as a literal expression. But then you have to compile the regex again for every batch you process through the UDF. Ideally you could have something like: ``` struct MyRegexUdf { regex: Regex } impl ScalarFunction for MyRegexUdf { // use regex on each value somehow } ``` The regex would only need to be compiled once during deserialization (or construction) instead of once for each batch. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
