I've made good progress on this the last couple days and I'm at the point of writing the new ScalarExecutor that uses the new ArraySpan/ExecSpan concepts, having done the initial "porting" of the scalar kernels to use the new data structures. I'm tired so I'm stopping for today, but I'm going to try the next couple days to get the codebase compiling again and the unit tests passing, which will hopefully go smoothly since we have lots of tests but we will see how it goes.
The refactoring was extremely gnarly (especially scalar_if_else.cc) and I have a growing list of notes about how we can improve the array kernels codebase. In particular, we should definitely eliminate "scalar" kernel outputs. This only gets triggered when all inputs are scalars, but in that case we should instead promote all inputs to arrays of length 1 (some kernels already do this!) and use the array code paths. This would allow us to delete a ton of tedious code and simplify our maintenance burden — there is no performance critical need for all-scalar function evaluation, so the performance hit of up-promoting scalars to arrays-of-length-1 is totally worth it for the ability to delete a ton of code. There are some critical unanswered questions around how we will handle the dichotomy between kernels that allocate memory and those that don't — I am hopeful that in subsequent PRs we can establish a scheme wherein we can use the ArraySpan data structure even for non-primitive outputs, but some kind of allocated-buffer bookkeeping concept in the KernelContext so that the executor layer can know when it needs to destroy a buffer that is no longer needed, rather than relying on the ArrayData destructor to release memory. My branch is here, but I would say it's a couple days away from being review-ready: https://github.com/apache/arrow/compare/master...wesm:lightweight-exec-batch I'll post a PR when I have something closer to a green build. We probably won't want to let this PR linger since it will cause conflicts with any PRs touching scalar kernels. On Fri, Jun 3, 2022 at 1:59 PM Wes McKinney <wesmck...@gmail.com> wrote: > > Thanks Sasha — this is helpful. I'm going to take a college try at > just the scalar kernels and see what I can accomplish over the next > few days — will attempt to get a PR up for review with the C++ tests > passing. I'm expecting assorted workarounds for the various kernels > that do zero-copy optimizations (setting output buffers with input > buffers — such optimizations should likely be carried out elsewhere), > etc., but I will keep notes about challenges that I run into to assist > with further refactoring and improvements. > > On Fri, Jun 3, 2022 at 1:20 PM Sasha Krassovsky > <krassovskysa...@gmail.com> wrote: > > > > Hi all, > > I’ve been thinking about some sort of refactoring of this registry for a > > while now, and I’ve written down some thoughts, please leave your comments. > > > > https://docs.google.com/document/d/1LAN9I_Y9cZaG2a84j1wLY8jSlK3gDXYMle-VtyFCAE8/edit?usp=sharing > > > > <https://docs.google.com/document/d/1LAN9I_Y9cZaG2a84j1wLY8jSlK3gDXYMle-VtyFCAE8/edit?usp=sharing> > > > > I agree with Weston that starting with a small set of kernels and drilling > > down the implementation until we have a design with the characteristics we > > want seems like a good idea. One point to consider is the tradeoff between > > the size of our set of guinea pig kernels and how representative they are > > of the rest. We will probably need to cover nested types as Weston said, > > aggregations, etc. > > > > Sasha > > > > > On Jun 3, 2022, at 2:35 AM, Weston Pace <weston.p...@gmail.com> wrote: > > > > > > That approach looks great and very much in line with some of the stuff > > > we have in light_array.h so I think it's very compatible. If you have > > > the time to push this refactoring through then go for it. Don't let > > > anything I'm saying deter any ongoing efforts. > > > > > > I'm just advocating that we be open to any kind of incremental > > > approach. I looked a little closer at this today and you and Antoine > > > are correct that "two function registries" is maybe not the right > > > extension point. Instead it might be that we have two classes of > > > scalar functions. For example, the "at most three buffers" ArraySpan > > > that you've linked only works for types without children. We do have > > > some scalar functions that run on nested types (e.g. count, index, > > > etc.) My point is that rather than force ourselves to deal with all > > > of these corner cases we should be open to an incremental solution. > > > Maybe there is a ScalarPrimitive and a ScalarNested kernel type or > > > Scalar and ScalarSpan. > > > > > > Also, if we know we are also going to want to tweak the output > > > interface (I don't know for sure if we will) then maybe it makes sense > > > to pick a small set of kernels and incrementally improve that small > > > set until we think we've made all the changes we are going to want for > > > the near future, and then promote those changes to the wider set of > > > kernels. > > > > > > On Thu, Jun 2, 2022 at 8:07 AM Wes McKinney <wesmck...@gmail.com> wrote: > > >> > > >> On this topic, I actually have started prototyping a new ScalarKernel > > >> exec interface that uses a non-owning, shared_ptr-free "ArraySpan" > > >> data structure based on some prior conversations > > >> > > >> https://github.com/wesm/arrow/blob/711fd5e5665c280540bbaf48a48ca1eca1b91bff/cpp/src/arrow/compute/exec.h#L163 > > >> https://github.com/wesm/arrow/blob/711fd5e5665c280540bbaf48a48ca1eca1b91bff/cpp/src/arrow/compute/kernel.h#L552 > > >> > > >> My idea was to refactor just the existing scalar kernels (which by my > > >> estimation seems tractable since so many go through the common > > >> templated kernel generators) and leave the other kinds of kernels > > >> untouched so that they can be refactored in due course. I hadn't > > >> planned to try to tackle changing the "output" interface from > > >> shared_ptr<ArrayData> because there are some kernels that create new > > >> ArrayData and some that don't. > > >> > > >> If this seems like a reasonable path, I can try to have a first draft > > >> PR ready to go maybe by Monday (I was going to work on this over the > > >> weekend when I can have some uninterrupted time to do the > > >> refactoring). I'm not sure that a new registry is going to be needed > > >> > > >> On Thu, Jun 2, 2022 at 2:50 AM Antoine Pitrou <anto...@python.org> wrote: > > >>> > > >>> > > >>> Le 02/06/2022 à 00:02, Weston Pace a écrit : > > >>>> > > >>>> I'd like to propose we add a second kernel function registry. There > > >>>> doesn't need to be any user facing API change. We could probably use > > >>>> an approach like [2] to proxy to the old function registry when the > > >>>> newer registry doesn't contain the asked-for function. This would > > >>>> allow us to focus on creating an efficient function registry without > > >>>> having to worry about refactoring the existing kernels all at once. > > >>> > > >>> Why do we need a separate registry for this? > >