bkietz commented on PR #14043: URL: https://github.com/apache/arrow/pull/14043#issuecomment-1255239682
> First, a user may directly use this where they would use the original function execution API. I understand, but *when do users touch* the function execution API? I think that'd primarily be through the python or R bindings to handle ad-hoc cases like adding two arrays together... and in that case, constructing a FunctionExecutor would not be useful since the user input time delay will greatly outweigh kernel lookup. A FunctionExecutor would only be useful when executing the same function multiple times- for example when applied to multiple batches from a stream of data. What I'd like to hear is when that's beneficial *and* isn't served by construction of an ExecPlan. > Second, as noted above, my motivation for this is related to UDFs, where their kernel would be preconfigured once then executed multiple times over a stream of batches (the kernel state ends up holding Python stuff). It's possible this kernel-preconfiguration can be integrated into expression binding too; I haven't looked into this. Kernel preconfiguration is precisely the function of Expression::Bind, among other things: - [invokes Function::DispatchBest](https://github.com/apache/arrow/blob/40ec95646962cccdcd62032c80e8506d4c275bc6/cpp/src/arrow/compute/exec/expression.cc#L372) to acquire a kernel and types for implicit casts - [caches the Kernel and its state](https://github.com/apache/arrow/blob/40ec95646962cccdcd62032c80e8506d4c275bc6/cpp/src/arrow/compute/exec/expression.h#L54-L58) for later use in execution - note that currently Expression execution assumes only scalar functions are referenced and that KernelState is not mutated In short, it seems that we won't be able to use FunctionExecutor where it seems to me we'd most like to see UDF capabilities: in filter and project expressions in ExecPlans. Since that will eventually require refactoring/extension of the Expression utilities, I'd prefer we start there so that we can have a better picture of the ways ExecPlan etc will need to change to accommodate UDFs. Building parallel streaming execution functionality which will ultimately need to be accommodated or assimilated by ExecPlans seems like much more churn. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
