Dandandan commented on issue #8879:
URL: https://github.com/apache/arrow-rs/issues/8879#issuecomment-3566945811
> An alternative to separate `take` implementations would be to introduce an
abstraction for the indices, similar to what `OffsetBuffer` is doing for
list/string offsets.
>
> pub struct IndicesBuffer<I: ArrowNativeType + Integer> {
> indices: ScalarBuffer<I>,
> /// the maximum length that can be indexed by the values in `indices`.
> /// this is usually one more than the maximum index, or 0 if `indices`
is empty.
> max_indexed_len: usize,
> }
> Creating this `IndicesBuffer` could be done either safely or unsafely and
the `take` kernels can do a very quick check against `max_indexed_len` to
ensure it does not access out of bounds. This would also be nice for usecases
like `sort_to_indices`, where the function already knows the maximum, because
it is just reordering an existing range.
That sounds like a great idea that avoids most of the overhead while not
introducing much unsafety.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]