[ https://issues.apache.org/jira/browse/ARROW-16211?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17614131#comment-17614131 ]
Yaron Gvili commented on ARROW-16211: ------------------------------------- I'm not against a specific and simple solution for a simple use case, and you're welcome to pursue it. In this discussion, my main aim was to explain how all use cases discussed here are supported in a straightforward way using nested registries and without the need to modify a registry instance while in use. > In the regular case it is just always `pc.call_function(..)`, in this case we > have to always make sure we do `registry_x.call_function`, isn't it? With this question, the discussion shifts from whether registry function removal is necessary (I argued it isn't) to how best to design a user API for calling registry functions in the context of at least this use case. I argue we can design a user API that encapsulates the active registry, so that the function caller need not remember it, as follows. The execution context could manage a stack of nested registries, so that a call-function invocation would automatically lookup the registry at the top of the stack. When a piece of code wants to set up a nested registry for a second piece of code it intends to invoke, it does so by adding the nested registry to this stack, invoking the second piece of code, and popping the stack. This context stack management ensures the correct registry instance is always in scope. Of course, that we can doesn't mean that we must. My aim in this point is to show that there is a well-designed alternative for registry function removal. > While with an approach of the ability to just drop what you don't need is way > easier. IMHO, it's a bit easier (e.g., removing a function from an existing registry instance vs creating a nested registry instance and removing from it) but less safe (potential side-effects and race conditions). A design tension between usability and safety is common, and calls for prioritization. My vote is to prioritize safety. > May be we should also allow the ability to unregister/override functions. > That would provide flexibility for the users to use the UDFs for the said > scenarios. If I'm forced to accept this way of registry editing, I'd say that then the docs would need to be very clear about the safety issues this practice raises and to describe a safer alternative as discussed here. I think if the safer alternative is not implemented via an easy API (like the one I described) then users will surely practice the less-safe alternative. This is why I view that adding these docs is a bit better but still insufficient for safety. > [C++][Python] Unregister compute functions > ------------------------------------------ > > Key: ARROW-16211 > URL: https://issues.apache.org/jira/browse/ARROW-16211 > Project: Apache Arrow > Issue Type: Sub-task > Components: C++, Python > Reporter: Vibhatha Lakmal Abeykoon > Assignee: Vibhatha Lakmal Abeykoon > Priority: Major > > In general, when using UDFs, the user defines a function expecting a > particular outcome. When building the program, there needs to be a way to > update existing function kernels if it expands beyond what is planned before. > In such situations, there should be a way to remove the existing definition > and add a new definition. To enable this, the unregister functionality has to > be included. -- This message was sent by Atlassian Jira (v8.20.10#820010)