On Wed, Apr 8, 2020 at 4:01 PM Wes McKinney <wesmck...@gmail.com> wrote: > > Another idea would be to have a variant with const-references instead > of shared_ptr. One potential issue with our Datum is that it plays the > dual role of transporting both input and output arguments. With > outputs it's necessary to be able to convey ownership while with > inputs this is less important. > > In general, I would say that some additional thought is needed about > our kernel APIs. Some scattered thoughts about this: > > * Kernel "binding" (selecting a kernel given input types) is a bit ad > hoc. Many kernels do not have the ability to tell you up front what > type of data will be returned (this is very important in the context > of a query engine runtime)
Another observation is that many of our kernels are unable to write into preallocated memory. This is also a design issue that must be addressed (kernels having the same output type invoked in succession should be able to elide temporary allocations by writing into the same output memory) > * It's unclear that having a large collection of > $FUNC(FunctionContext*, Datum $arg0, ..., Datum* out) is sustainable. > > Rather, it might be better to have something like: > > // We know the "schema" of each argument but don't have any data yet > std::string kernel_name = "$name"; > auto bound_kernel = GetKernel(kernel_name, {arg0_shape, arg1_shape}, options); > auto out = bound_kernel->Call({arg0, arg1}); > > So here it would be > > ARROW_ASSIGN_OR_RAISE(auto take_kernel, > GetKernel("take", {shapes::chunked_array(int8()), > shapes::chunked_array(int8())})); > > So now take_kernel->shape() should tell you that the expected output > is shapes::chunked_array(int8()) > > And Call should preferably accept reference arguments for its input > parameters. > > As far as kernel-specific options, we could create a variant that > includes the different kernel-specific options structures, so that > it's not necessary to have different dispatch functions for different > kernels. > > Anyway, just some out loud thoughts, but while we are in this > intermediate experimental state I'm supportive of you making the > decision that is most convenient for the immediate problem you want to > solve with the caveat that things may be refactored significantly in > the proximate future. > > - Wes > > On Tue, Apr 7, 2020 at 10:42 AM Uwe L. Korn <uw...@xhochy.com> wrote: > > > > I did a bit more research on JIRA and we seem to have this open topic there > > also in https://issues.apache.org/jira/browse/ARROW-6959 which is the > > similar topic as my mail is about and in > > https://issues.apache.org/jira/browse/ARROW-7009 we wanted to remove some > > of the interfaces with reference-types. > > > > On Tue, Apr 7, 2020, at 1:00 PM, Uwe L. Korn wrote: > > > Hello all, > > > > > > I'm in the progress of changing the implementation of the Take kernel > > > to work on ChunkedArrays without concatenating them into a single Array > > > first. While working on the implementation, I realised that we switch > > > often between Datum and the specific-typed parameters. This works quite > > > well for the combination of Array& and Datum(shared_ptr<ArrayData>) as > > > here the reference object with type Array& always carries a shared > > > reference with it, so switching between Array& and its Datum is quite > > > easy. > > > > > > In contrast, we cannot do this with ChunkedArrays as here the Datum > > > requires a shared_ptr<ChunkedArray> which cannot be constructed from > > > the reference type. Thus to allow interfaces like `Status > > > Take(FunctionContext* ctx, const ChunkedArray& values, const Array& > > > indices,` to pass successfully their arguments to the Kernel > > > implementation, we have to do: > > > > > > a) Remove the references from the interface of the Take() function and > > > use `shared_ptr` instances everywhere. > > > b) Add interfaces to kernels like the TakeKernel that allow calling > > > with specific references instead of Datum instances > > > > > > Personally I would prefer b) as this allow us to make more use of the > > > C++ type system and would also avoid the shared_ptr overhead where not > > > necessary. > > > > > > Cheers, > > > Uwe > > >