westonpace commented on PR #13500: URL: https://github.com/apache/arrow/pull/13500#issuecomment-1176780265
The approach, if I'm understanding correctly, is to use C++ to make two passes through the plan (or maybe its one pass). The first pass gets all the UDFs out of the plan. Pyarrow then unpickles and registers those UDFs. The second actually consumes the plan, using a registry that contains those unpickled functions. This wouldn't be my first approach. I think I'd prefer adding another callback like the consumer_factory for UDF handling. This would make it easier to handle situations where there are alternative UDF handlers. Or, for example, a C++ or R user that still wants to be able to run python UDFs. However, I'm not opposed to this approach. The end pyarrow interface to the user is still just "substrait in->data out" so if we wanted to move to a different approach in the future that would be fine. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
