Hi Ben and Wes, Based on our discussion, I did the following. https://hastebin.com/ajadonados.cpp
It seems to be working fine. Would love to get your feedback on this! :-) But I have a couple of concerns. 1. Say I want to communicate the intermediate state data across multiple processes. Unfortunately, KernelState struct does not expose the data pointer to the outside. If say SumState is exposed, we could have accessed that data, isn't it? WDYT? 2. Polymorphism and virtual functions - Intuitively, a mean aggregation intermediate state would be a {T, int64} tuple. But I believe the size of struct "SumImpl : public ScalarAggregator (:public KernelState)" would be sizeof(T) + sizeof(int64) + sizeof(ScalarAggregator) + sizeof(vptr), isn't it? So, if I am managing a compute state container, this means that my memory requirement would be higher than simply using a {T, int64} tuple. Please correct me if I am wrong. I am not sure if there is a better solution to this, but just want to discuss it with you. On Tue, Nov 10, 2020 at 9:44 AM Wes McKinney <wesmck...@gmail.com> wrote: > Yes, open a Jira and propose a PR implementing the changes you need > > On Mon, Nov 9, 2020 at 8:31 PM Niranda Perera <niranda.per...@gmail.com> > wrote: > > > > @wes How should I proceed with this nevertheless? should I open a JIRA? > > > > On Mon, Nov 9, 2020 at 11:09 AM Wes McKinney <wesmck...@gmail.com> > wrote: > > > > > On Mon, Nov 9, 2020 at 9:32 AM Niranda Perera < > niranda.per...@gmail.com> > > > wrote: > > > > > > > > @Ben > > > > Thank you very much for the feedback. But unfortunately, I was > unable to > > > > find a header that exposes a SumAggregateKernel in the v2.0.0. Maybe > I am > > > > checking it wrong. I remember accessing them in v0.16 IINM. > > > > > > > > @Wes > > > > Yes, that would be great. How about adding a CMake compilation flag > for > > > > such dev use cases? > > > > > > > > > > This seems like it could cause more problems -- I think it would be > > > sufficient to use an "internal::" C++ namespace and always install the > > > relevant header file > > > > > > > > > > > > > > > On Sun, Nov 8, 2020 at 9:14 PM Wes McKinney <wesmck...@gmail.com> > wrote: > > > > > > > > > I'm not opposed to installing headers that provide access to some > of > > > > > the kernel implementation internals (with the caveat that changes > > > > > won't go through a deprecation cycle, so caveat emptor). It might > be > > > > > more sustainable to think about what kind of stable-ish public API > > > > > could be exported to support applications like Cylon. > > > > > > > > > > On Sun, Nov 8, 2020 at 10:37 AM Ben Kietzman < > b...@ursacomputing.com> > > > > > wrote: > > > > > > > > > > > > Hi Niranda, > > > > > > > > > > > > SumImpl is a subclass of KernelState. Given a SumAggregateKernel, > > > one can > > > > > > produce zeroed KernelState using the `init` member, then operate > on > > > data > > > > > > using the `consume`, `merge`, and `finalize` members. You can > look at > > > > > > ScalarAggExecutor for an example of how to get from a compute > > > function to > > > > > > kernels and kernel state. Will that work for you? > > > > > > > > > > > > Ben Kietzman > > > > > > > > > > > > On Sun, Nov 8, 2020, 11:21 Niranda Perera < > niranda.per...@gmail.com> > > > > > wrote: > > > > > > > > > > > > > Hi Ben, > > > > > > > > > > > > > > We are building a distributed table abstraction on top of Arrow > > > > > dataframes > > > > > > > called Cylon (https://github.com/cylondata/cylon). Currently > we > > > have a > > > > > > > simple aggregation and group-by operation implementation. But > we > > > felt > > > > > like > > > > > > > we can give more functionality if we can import arrow kernels > and > > > > > states to > > > > > > > corresponding cylon distributed kernels. > > > > > > > Ex: For distributed mean, we would have to communicate the > local > > > arrow > > > > > > > SumState and then do a SumImpl::MergeFrom() and the call > Finalize. > > > > > > > Is there any other way to access these intermediate states from > > > compute > > > > > > > operations? > > > > > > > > > > > > > > On Sun, Nov 8, 2020 at 11:11 AM Ben Kietzman < > > > b...@ursacomputing.com> > > > > > > > wrote: > > > > > > > > > > > > > > > Ni Niranda, > > > > > > > > > > > > > > > > What is the context of your work? if you're working inside > the > > > arrow > > > > > > > > repository you shouldn't need to install headers before using > > > them, > > > > > and > > > > > > > we > > > > > > > > welcome PRs for new kernels. Otherwise, could you provide > some > > > > > details > > > > > > > > about how your work is using Arrow as a dependency? > > > > > > > > > > > > > > > > Ben Kietzman > > > > > > > > > > > > > > > > On Sun, Nov 8, 2020, 10:57 Niranda Perera < > > > niranda.per...@gmail.com> > > > > > > > > wrote: > > > > > > > > > > > > > > > > > Hi, > > > > > > > > > > > > > > > > > > I was wondering if I could use the > > > > > arrow/compute/kernels/*internal.h > > > > > > > > > headers in my work? I would like to reuse some of the > kernel > > > > > > > > > implementations and kernel states. > > > > > > > > > > > > > > > > > > With -DARROW_COMPUTE=ON, those headers are not added into > the > > > > > include > > > > > > > > dir. > > > > > > > > > I see that the *internal.h headers are skipped from > > > > > > > > > the ARROW_INSTALL_ALL_HEADERS cmake function unfortunately. > > > > > > > > > > > > > > > > > > Best > > > > > > > > > -- > > > > > > > > > Niranda Perera > > > > > > > > > @n1r44 <https://twitter.com/N1R44> > > > > > > > > > +1 812 558 8884 / +94 71 554 8430 > > > > > > > > > https://www.linkedin.com/in/niranda > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > Niranda Perera > > > > > > > @n1r44 <https://twitter.com/N1R44> > > > > > > > +1 812 558 8884 / +94 71 554 8430 > > > > > > > https://www.linkedin.com/in/niranda > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > Niranda Perera > > > > @n1r44 <https://twitter.com/N1R44> > > > > +1 812 558 8884 / +94 71 554 8430 > > > > https://www.linkedin.com/in/niranda > > > > > > > > > -- > > Niranda Perera > > @n1r44 <https://twitter.com/N1R44> > > +1 812 558 8884 / +94 71 554 8430 > > https://www.linkedin.com/in/niranda > -- Niranda Perera @n1r44 <https://twitter.com/N1R44> +1 812 558 8884 / +94 71 554 8430 https://www.linkedin.com/in/niranda