That approach looks great and very much in line with some of the stuff we have in light_array.h so I think it's very compatible. If you have the time to push this refactoring through then go for it. Don't let anything I'm saying deter any ongoing efforts.
I'm just advocating that we be open to any kind of incremental approach. I looked a little closer at this today and you and Antoine are correct that "two function registries" is maybe not the right extension point. Instead it might be that we have two classes of scalar functions. For example, the "at most three buffers" ArraySpan that you've linked only works for types without children. We do have some scalar functions that run on nested types (e.g. count, index, etc.) My point is that rather than force ourselves to deal with all of these corner cases we should be open to an incremental solution. Maybe there is a ScalarPrimitive and a ScalarNested kernel type or Scalar and ScalarSpan. Also, if we know we are also going to want to tweak the output interface (I don't know for sure if we will) then maybe it makes sense to pick a small set of kernels and incrementally improve that small set until we think we've made all the changes we are going to want for the near future, and then promote those changes to the wider set of kernels. On Thu, Jun 2, 2022 at 8:07 AM Wes McKinney <wesmck...@gmail.com> wrote: > > On this topic, I actually have started prototyping a new ScalarKernel > exec interface that uses a non-owning, shared_ptr-free "ArraySpan" > data structure based on some prior conversations > > https://github.com/wesm/arrow/blob/711fd5e5665c280540bbaf48a48ca1eca1b91bff/cpp/src/arrow/compute/exec.h#L163 > https://github.com/wesm/arrow/blob/711fd5e5665c280540bbaf48a48ca1eca1b91bff/cpp/src/arrow/compute/kernel.h#L552 > > My idea was to refactor just the existing scalar kernels (which by my > estimation seems tractable since so many go through the common > templated kernel generators) and leave the other kinds of kernels > untouched so that they can be refactored in due course. I hadn't > planned to try to tackle changing the "output" interface from > shared_ptr<ArrayData> because there are some kernels that create new > ArrayData and some that don't. > > If this seems like a reasonable path, I can try to have a first draft > PR ready to go maybe by Monday (I was going to work on this over the > weekend when I can have some uninterrupted time to do the > refactoring). I'm not sure that a new registry is going to be needed > > On Thu, Jun 2, 2022 at 2:50 AM Antoine Pitrou <anto...@python.org> wrote: > > > > > > Le 02/06/2022 à 00:02, Weston Pace a écrit : > > > > > > I'd like to propose we add a second kernel function registry. There > > > doesn't need to be any user facing API change. We could probably use > > > an approach like [2] to proxy to the old function registry when the > > > newer registry doesn't contain the asked-for function. This would > > > allow us to focus on creating an efficient function registry without > > > having to worry about refactoring the existing kernels all at once. > > > > Why do we need a separate registry for this?