bkietz commented on PR #14043:
URL: https://github.com/apache/arrow/pull/14043#issuecomment-1255239682

   > First, a user may directly use this where they would use the original 
function execution API.
   
   I understand, but *when do users touch* the function execution API? I think 
that'd primarily be through the python or R bindings to handle ad-hoc cases 
like adding two arrays together... and in that case, constructing a 
FunctionExecutor would not be useful since the user input time delay will 
greatly outweigh kernel lookup.
   
   A FunctionExecutor would only be useful when executing the same function 
multiple times- for example when applied to multiple batches from a stream of 
data. What I'd like to hear is when that's beneficial *and* isn't served by 
construction of an ExecPlan.
   
   > Second, as noted above, my motivation for this is related to UDFs, where 
their kernel would be preconfigured once then executed multiple times over a 
stream of batches (the kernel state ends up holding Python stuff). It's 
possible this kernel-preconfiguration can be integrated into expression binding 
too; I haven't looked into this.
   
   Kernel preconfiguration is precisely the function of Expression::Bind, among 
other things:
   - [invokes 
Function::DispatchBest](https://github.com/apache/arrow/blob/40ec95646962cccdcd62032c80e8506d4c275bc6/cpp/src/arrow/compute/exec/expression.cc#L372)
 to acquire a kernel and types for implicit casts
   - [caches the Kernel and its 
state](https://github.com/apache/arrow/blob/40ec95646962cccdcd62032c80e8506d4c275bc6/cpp/src/arrow/compute/exec/expression.h#L54-L58)
 for later use in execution
   - note that currently Expression execution assumes only scalar functions are 
referenced and that KernelState is not mutated
   
   In short, it seems that we won't be able to use FunctionExecutor where it 
seems to me we'd most like to see UDF capabilities: in filter and project 
expressions in ExecPlans. Since that will eventually require 
refactoring/extension of the Expression utilities, I'd prefer we start there so 
that we can have a better picture of the ways ExecPlan etc will need to change 
to accommodate UDFs. Building parallel streaming execution functionality which 
will ultimately need to be accommodated or assimilated by ExecPlans seems like 
much more churn.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to