We handle arbitrary slices in C++ and it hasn't seemed especially
burdensome. We have some utilities (e.g. see
arrow/util/bit_block_counter.h) to facilitate efficiently iterating
through bitmaps 64 bits at a time (even when slices on an unaligned
offset) and associated bit-by-bit iterators (e.g. BitmapReader,
BitmapWriter)

On Mon, Aug 3, 2020 at 7:38 AM Jörn Horstmann
<joern.horstm...@signavio.com> wrote:
>
> While investigating an issue regarding offset handling in the rust
> arithmetic kernels (https://issues.apache.org/jira/browse/ARROW-9583),
> I started to wonder how the other implementations are handling compute
> on buffer slices.
>
> The rust implementation currently allows creating slices of arrays
> starting at arbitrary aligned offsets. This becomes a problem with
> boolean arrays and with the null bitmaps, since operations on those
> are currently working with whole bytes as the smallest unit. There
> could be several options to solve this, all adding additional
> complexity or having other downsides:
>
> - calculate null bitmaps bit by bit if not properly aligned, leading
> to a big performance drop
> - calculate null bitmaps on whole bytes and then try to rotate the
> resulting buffer by a certain number of bits. quite complex code and
> also some performance overhead
> - disallow compute kernels on non-aligned buffers, at least if null
> bitmaps are involved
>
> I'm leaning towards the last option, a draft PR is at
> https://github.com/apache/arrow/pull/7854
>
> Another issue with offsets is that, at least in the rust
> implementation, some simd kernels currently assume the whole buffer to
> be aligned to 64 bytes. As soon as there is an offset that is not a
> multiple of 64, this could lead to unsafe out of bounds reads and
> writes of memory.
>
> I'm very interested in how the C++ and Java implementations handle those 
> issues.
>
> --
> Jörn Horstmann | Senior Backend Engineer
>
> www.signavio.com
> Kurfürstenstraße 111, 10787 Berlin, Germany

Reply via email to