Hi all, 
I’ve been thinking about some sort of refactoring of this registry for a while 
now, and I’ve written down some thoughts, please leave your comments.

https://docs.google.com/document/d/1LAN9I_Y9cZaG2a84j1wLY8jSlK3gDXYMle-VtyFCAE8/edit?usp=sharing
 
<https://docs.google.com/document/d/1LAN9I_Y9cZaG2a84j1wLY8jSlK3gDXYMle-VtyFCAE8/edit?usp=sharing>

I agree with Weston that starting with a small set of kernels and drilling down 
the implementation until we have a design with the characteristics we want 
seems like a good idea. One point to consider is the tradeoff between the size 
of our set of guinea pig kernels and how representative they are of the rest. 
We will probably need to cover nested types as Weston said, aggregations, etc. 

Sasha

> On Jun 3, 2022, at 2:35 AM, Weston Pace <weston.p...@gmail.com> wrote:
> 
> That approach looks great and very much in line with some of the stuff
> we have in light_array.h so I think it's very compatible.  If you have
> the time to push this refactoring through then go for it.  Don't let
> anything I'm saying deter any ongoing efforts.
> 
> I'm just advocating that we be open to any kind of incremental
> approach.  I looked a little closer at this today and you and Antoine
> are correct that "two function registries" is maybe not the right
> extension point.  Instead it might be that we have two classes of
> scalar functions.  For example, the "at most three buffers" ArraySpan
> that you've linked only works for types without children.  We do have
> some scalar functions that run on nested types (e.g. count, index,
> etc.)  My point is that rather than force ourselves to deal with all
> of these corner cases we should be open to an incremental solution.
> Maybe there is a ScalarPrimitive and a ScalarNested kernel type or
> Scalar and ScalarSpan.
> 
> Also, if we know we are also going to want to tweak the output
> interface (I don't know for sure if we will) then maybe it makes sense
> to pick a small set of kernels and incrementally improve that small
> set until we think we've made all the changes we are going to want for
> the near future, and then promote those changes to the wider set of
> kernels.
> 
> On Thu, Jun 2, 2022 at 8:07 AM Wes McKinney <wesmck...@gmail.com> wrote:
>> 
>> On this topic, I actually have started prototyping a new ScalarKernel
>> exec interface that uses a non-owning, shared_ptr-free "ArraySpan"
>> data structure based on some prior conversations
>> 
>> https://github.com/wesm/arrow/blob/711fd5e5665c280540bbaf48a48ca1eca1b91bff/cpp/src/arrow/compute/exec.h#L163
>> https://github.com/wesm/arrow/blob/711fd5e5665c280540bbaf48a48ca1eca1b91bff/cpp/src/arrow/compute/kernel.h#L552
>> 
>> My idea was to refactor just the existing scalar kernels (which by my
>> estimation seems tractable since so many go through the common
>> templated kernel generators) and leave the other kinds of kernels
>> untouched so that they can be refactored in due course. I hadn't
>> planned to try to tackle changing the "output" interface from
>> shared_ptr<ArrayData> because there are some kernels that create new
>> ArrayData and some that don't.
>> 
>> If this seems like a reasonable path, I can try to have a first draft
>> PR ready to go maybe by Monday (I was going to work on this over the
>> weekend when I can have some uninterrupted time to do the
>> refactoring). I'm not sure that a new registry is going to be needed
>> 
>> On Thu, Jun 2, 2022 at 2:50 AM Antoine Pitrou <anto...@python.org> wrote:
>>> 
>>> 
>>> Le 02/06/2022 à 00:02, Weston Pace a écrit :
>>>> 
>>>> I'd like to propose we add a second kernel function registry.  There
>>>> doesn't need to be any user facing API change.  We could probably use
>>>> an approach like [2] to proxy to the old function registry when the
>>>> newer registry doesn't contain the asked-for function.  This would
>>>> allow us to focus on creating an efficient function registry without
>>>> having to worry about refactoring the existing kernels all at once.
>>> 
>>> Why do we need a separate registry for this?

Reply via email to