[ 
https://issues.apache.org/jira/browse/ARROW-16211?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17614131#comment-17614131
 ] 

Yaron Gvili commented on ARROW-16211:
-------------------------------------

I'm not against a specific and simple solution for a simple use case, and 
you're welcome to pursue it. In this discussion, my main aim was to explain how 
all use cases discussed here are supported in a straightforward way using 
nested registries and without the need to modify a registry instance while in 
use.

> In the regular case it is just always `pc.call_function(..)`, in this case we 
> have to always make sure we do `registry_x.call_function`, isn't it?

With this question, the discussion shifts from whether registry function 
removal is necessary (I argued it isn't) to how best to design a user API for 
calling registry functions in the context of at least this use case.

I argue we can design a user API that encapsulates the active registry, so that 
the function caller need not remember it, as follows. The execution context 
could manage a stack of nested registries, so that a call-function invocation 
would automatically lookup the registry at the top of the stack. When a piece 
of code wants to set up a nested registry for a second piece of code it intends 
to invoke, it does so by adding the nested registry to this stack, invoking the 
second piece of code, and popping the stack. This context stack management 
ensures the correct registry instance is always in scope.

Of course, that we can doesn't mean that we must. My aim in this point is to 
show that there is a well-designed alternative for registry function removal.

> While with an approach of the ability to just drop what you don't need is way 
> easier.

IMHO, it's a bit easier (e.g., removing a function from an existing registry 
instance vs creating a nested registry instance and removing from it) but less 
safe (potential side-effects and race conditions). A design tension between 
usability and safety is common, and calls for prioritization. My vote is to 
prioritize safety.

> May be we should also allow the ability to unregister/override functions. 
> That would provide flexibility for the users to use the UDFs for the said 
> scenarios.

If I'm forced to accept this way of registry editing, I'd say that then the 
docs would need to be very clear about the safety issues this practice raises 
and to describe a safer alternative as discussed here. I think if the safer 
alternative is not implemented via an easy API (like the one I described) then 
users will surely practice the less-safe alternative. This is why I view that 
adding these docs is a bit better but still insufficient for safety.

> [C++][Python] Unregister compute functions
> ------------------------------------------
>
>                 Key: ARROW-16211
>                 URL: https://issues.apache.org/jira/browse/ARROW-16211
>             Project: Apache Arrow
>          Issue Type: Sub-task
>          Components: C++, Python
>            Reporter: Vibhatha Lakmal Abeykoon
>            Assignee: Vibhatha Lakmal Abeykoon
>            Priority: Major
>
> In general, when using UDFs, the user defines a function expecting a 
> particular outcome. When building the program, there needs to be a way to 
> update existing function kernels if it expands beyond what is planned before. 
> In such situations, there should be a way to remove the existing definition 
> and add a new definition. To enable this, the unregister functionality has to 
> be included. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to