thinkharderdev commented on issue #8045:
URL: 
https://github.com/apache/arrow-datafusion/issues/8045#issuecomment-1793267211

   > > Would like to add that supporting serialization of user-defined 
functions would be quite nice.
   > 
   > I don't understand this question @thinkharderdev 🤔
   > 
   > The current approach to serialize a `ScalarUDF` as I understand it, is to 
send over the function name along with the arguments
   > 
   > 
https://github.com/apache/arrow-datafusion/blob/c2e768052c43e4bab6705ee76befc19de383c2cb/datafusion/proto/proto/datafusion.proto#L686-L689
   > 
   > And then the physical plan deserialization looks that function up by name 
https://docs.rs/datafusion-proto/latest/src/datafusion_proto/physical_plan/from_proto.rs.html#318-333
   > 
   > The only difference compared to `BuiltInScalarFunction` is that 
`BuiltInScalarFunction` uses a number (in an enum) which might be slightly 
faster
   
   Sorry, what I mean is that it would be useful to be able to serialize 
constant parameters into the user-defined scalar function themselves rather 
than pass them in as expressions. So for instance if you had to create a UDF to 
do something with a regex that you have as a static constant. Currently the way 
to do that is pass it as a literal expression. But then you have to compile the 
regex again for every batch you process through the UDF. Ideally you could have 
something like:
   
   ```
   struct MyRegexUdf {
     regex: Regex
   }
   
   impl ScalarFunction for MyRegexUdf {
     // use regex on each value somehow
   }
   ```
   
   The regex would only need to be compiled once during deserialization (or 
construction) instead of once for each batch. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to