On Wed, Apr 8, 2020 at 4:01 PM Wes McKinney <wesmck...@gmail.com> wrote:
>
> Another idea would be to have a variant with const-references instead
> of shared_ptr. One potential issue with our Datum is that it plays the
> dual role of transporting both input and output arguments. With
> outputs it's necessary to be able to convey ownership while with
> inputs this is less important.
>
> In general, I would say that some additional thought is needed about
> our kernel APIs. Some scattered thoughts about this:
>
> * Kernel "binding" (selecting a kernel given input types) is a bit ad
> hoc. Many kernels do not have the ability to tell you up front what
> type of data will be returned (this is very important in the context
> of a query engine runtime)

Another observation is that many of our kernels are unable to write
into preallocated memory. This is also a design issue that must be
addressed (kernels having the same output type invoked in succession
should be able to elide temporary allocations by writing into the same
output memory)

> * It's unclear that having a large collection of
> $FUNC(FunctionContext*, Datum $arg0, ..., Datum* out) is sustainable.
>
> Rather, it might be better to have something like:
>
> // We know the "schema" of each argument but don't have any data yet
> std::string kernel_name = "$name";
> auto bound_kernel = GetKernel(kernel_name, {arg0_shape, arg1_shape}, options);
> auto out = bound_kernel->Call({arg0, arg1});
>
> So here it would be
>
> ARROW_ASSIGN_OR_RAISE(auto take_kernel,
>                       GetKernel("take", {shapes::chunked_array(int8()),
>                                          shapes::chunked_array(int8())}));
>
> So now take_kernel->shape() should tell you that the expected output
> is shapes::chunked_array(int8())
>
> And Call should preferably accept reference arguments for its input 
> parameters.
>
> As far as kernel-specific options, we could create a variant that
> includes the different kernel-specific options structures, so that
> it's not necessary to have different dispatch functions for different
> kernels.
>
> Anyway, just some out loud thoughts, but while we are in this
> intermediate experimental state I'm supportive of you making the
> decision that is most convenient for the immediate problem you want to
> solve with the caveat that things may be refactored significantly in
> the proximate future.
>
> - Wes
>
> On Tue, Apr 7, 2020 at 10:42 AM Uwe L. Korn <uw...@xhochy.com> wrote:
> >
> > I did a bit more research on JIRA and we seem to have this open topic there 
> > also in https://issues.apache.org/jira/browse/ARROW-6959 which is the 
> > similar topic as my mail is about and in 
> > https://issues.apache.org/jira/browse/ARROW-7009 we wanted to remove some 
> > of the interfaces with reference-types.
> >
> > On Tue, Apr 7, 2020, at 1:00 PM, Uwe L. Korn wrote:
> > > Hello all,
> > >
> > > I'm in the progress of changing the implementation of the Take kernel
> > > to work on ChunkedArrays without concatenating them into a single Array
> > > first. While working on the implementation, I realised that we switch
> > > often between Datum and the specific-typed parameters. This works quite
> > > well for the combination of Array& and Datum(shared_ptr<ArrayData>) as
> > > here the reference object with type Array& always carries a shared
> > > reference with it, so switching between Array& and its Datum is quite
> > > easy.
> > >
> > > In contrast, we cannot do this with ChunkedArrays as here the Datum
> > > requires a shared_ptr<ChunkedArray> which cannot be constructed from
> > > the reference type. Thus to allow interfaces like `Status
> > > Take(FunctionContext* ctx, const ChunkedArray& values, const Array&
> > > indices,` to pass successfully their arguments to the Kernel
> > > implementation, we have to do:
> > >
> > > a) Remove the references from the interface of the Take() function and
> > > use `shared_ptr` instances everywhere.
> > > b) Add interfaces to kernels like the TakeKernel that allow calling
> > > with specific references instead of Datum instances
> > >
> > > Personally I would prefer b) as this allow us to make more use of the
> > > C++ type system and would also avoid the shared_ptr overhead where not
> > > necessary.
> > >
> > > Cheers,
> > > Uwe
> > >

Reply via email to