[GitHub] [arrow] vibhatha commented on pull request #12590: ARROW-15639 [C++][Python] UDF Scalar Function Implementation

GitBox Tue, 05 Apr 2022 05:00:24 -0700


vibhatha commented on PR #12590:
URL: https://github.com/apache/arrow/pull/12590#issuecomment-1088616557


   @westonpace @jorisvandenbossche looking into the reviews and missing pieces 
of this PR. 
   
   # Multiple Kernel Registration and Unregister/Update an existing function 
   
   Looking into the usability limitations in this PR. I propose the following. 
   
   Here are some scenarios which are intended to be tackled.  
   
   1. User knows all the kernels required for a given function definition. 
Meaning the behaviour of the function is known
   and all the input data types and all the output types are known. 
   
   In this case, what we have to do is accept a list of input and output types 
and the corresponding parameters required to add a
   kernel. This could be a minor modification to the existing function 
registration API.
   
   2. Unregistration/Update
   
   Ability to update an existing function or remove an existing function 
definition. To enable this, the base requirement is 
   to unregister a function. The unregister_function can return a function that 
can then be flushed or updated. 
   
   3. Another case is where the user doesn't know all the kernels required for 
a given function definition at first or requires to add them dynamically 
   as required by the program. For instance, first, the user registers the 
function to support input types of int32 and int64. 
   Later on, the user wants to add a kernel for float32. This kind of behaviour 
can be mostly seen in Notebook users working
   on data science problems. This is a value-added utility for users. 
   
   In this case, it is important to have the unregister function and pop the 
already registered kernels and re-register
   the old kernels with new kernels. This is a usability feature. So this has 
to be internally handled so that the user doesn't
   have to take care of these details. This is a usability piece. This becomes 
a non-trivial feature when their data has to be
   handled dynamically and all the required data types are not known prior. But 
this could be a very rare case. And also, this
   becomes a trivial problem if the user code defines them in various parts of 
the code base, where the user can avoid it and define
   all the functions in one place. 
   
   The summary is this can sum up to 3 use cases or 2 use cases depending on 
the usability piece. 
   
   
   When we address these cases, we can take care of the issues associated with 
editing functions and registering multiple kernels for the same function 
(function name). 
   
   I propose to create a separate PR to include these changes. WDYT?  
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow] vibhatha commented on pull request #12590: ARROW-15639 [C++][Python] UDF Scalar Function Implementation

Reply via email to