Thanks Sasha — this is helpful. I'm going to take a college try at
just the scalar kernels and see what I can accomplish over the next
few days — will attempt to get a PR up for review with the C++ tests
passing. I'm expecting assorted workarounds for the various kernels
that do zero-copy optimizations (setting output buffers with input
buffers — such optimizations should likely be carried out elsewhere),
etc., but I will keep notes about challenges that I run into to assist
with further refactoring and improvements.

On Fri, Jun 3, 2022 at 1:20 PM Sasha Krassovsky
<krassovskysa...@gmail.com> wrote:
>
> Hi all,
> I’ve been thinking about some sort of refactoring of this registry for a 
> while now, and I’ve written down some thoughts, please leave your comments.
>
> https://docs.google.com/document/d/1LAN9I_Y9cZaG2a84j1wLY8jSlK3gDXYMle-VtyFCAE8/edit?usp=sharing
>  
> <https://docs.google.com/document/d/1LAN9I_Y9cZaG2a84j1wLY8jSlK3gDXYMle-VtyFCAE8/edit?usp=sharing>
>
> I agree with Weston that starting with a small set of kernels and drilling 
> down the implementation until we have a design with the characteristics we 
> want seems like a good idea. One point to consider is the tradeoff between 
> the size of our set of guinea pig kernels and how representative they are of 
> the rest. We will probably need to cover nested types as Weston said, 
> aggregations, etc.
>
> Sasha
>
> > On Jun 3, 2022, at 2:35 AM, Weston Pace <weston.p...@gmail.com> wrote:
> >
> > That approach looks great and very much in line with some of the stuff
> > we have in light_array.h so I think it's very compatible.  If you have
> > the time to push this refactoring through then go for it.  Don't let
> > anything I'm saying deter any ongoing efforts.
> >
> > I'm just advocating that we be open to any kind of incremental
> > approach.  I looked a little closer at this today and you and Antoine
> > are correct that "two function registries" is maybe not the right
> > extension point.  Instead it might be that we have two classes of
> > scalar functions.  For example, the "at most three buffers" ArraySpan
> > that you've linked only works for types without children.  We do have
> > some scalar functions that run on nested types (e.g. count, index,
> > etc.)  My point is that rather than force ourselves to deal with all
> > of these corner cases we should be open to an incremental solution.
> > Maybe there is a ScalarPrimitive and a ScalarNested kernel type or
> > Scalar and ScalarSpan.
> >
> > Also, if we know we are also going to want to tweak the output
> > interface (I don't know for sure if we will) then maybe it makes sense
> > to pick a small set of kernels and incrementally improve that small
> > set until we think we've made all the changes we are going to want for
> > the near future, and then promote those changes to the wider set of
> > kernels.
> >
> > On Thu, Jun 2, 2022 at 8:07 AM Wes McKinney <wesmck...@gmail.com> wrote:
> >>
> >> On this topic, I actually have started prototyping a new ScalarKernel
> >> exec interface that uses a non-owning, shared_ptr-free "ArraySpan"
> >> data structure based on some prior conversations
> >>
> >> https://github.com/wesm/arrow/blob/711fd5e5665c280540bbaf48a48ca1eca1b91bff/cpp/src/arrow/compute/exec.h#L163
> >> https://github.com/wesm/arrow/blob/711fd5e5665c280540bbaf48a48ca1eca1b91bff/cpp/src/arrow/compute/kernel.h#L552
> >>
> >> My idea was to refactor just the existing scalar kernels (which by my
> >> estimation seems tractable since so many go through the common
> >> templated kernel generators) and leave the other kinds of kernels
> >> untouched so that they can be refactored in due course. I hadn't
> >> planned to try to tackle changing the "output" interface from
> >> shared_ptr<ArrayData> because there are some kernels that create new
> >> ArrayData and some that don't.
> >>
> >> If this seems like a reasonable path, I can try to have a first draft
> >> PR ready to go maybe by Monday (I was going to work on this over the
> >> weekend when I can have some uninterrupted time to do the
> >> refactoring). I'm not sure that a new registry is going to be needed
> >>
> >> On Thu, Jun 2, 2022 at 2:50 AM Antoine Pitrou <anto...@python.org> wrote:
> >>>
> >>>
> >>> Le 02/06/2022 à 00:02, Weston Pace a écrit :
> >>>>
> >>>> I'd like to propose we add a second kernel function registry.  There
> >>>> doesn't need to be any user facing API change.  We could probably use
> >>>> an approach like [2] to proxy to the old function registry when the
> >>>> newer registry doesn't contain the asked-for function.  This would
> >>>> allow us to focus on creating an efficient function registry without
> >>>> having to worry about refactoring the existing kernels all at once.
> >>>
> >>> Why do we need a separate registry for this?
>

Reply via email to