Also for alignment requirements in C++ we sometimes process unaligned
leading/trailing data to reach the required alignment.


On Monday, August 3, 2020, Wes McKinney <wesmck...@gmail.com> wrote:

> We handle arbitrary slices in C++ and it hasn't seemed especially
> burdensome. We have some utilities (e.g. see
> arrow/util/bit_block_counter.h) to facilitate efficiently iterating
> through bitmaps 64 bits at a time (even when slices on an unaligned
> offset) and associated bit-by-bit iterators (e.g. BitmapReader,
> BitmapWriter)
>
> On Mon, Aug 3, 2020 at 7:38 AM Jörn Horstmann
> <joern.horstm...@signavio.com> wrote:
> >
> > While investigating an issue regarding offset handling in the rust
> > arithmetic kernels (https://issues.apache.org/jira/browse/ARROW-9583),
> > I started to wonder how the other implementations are handling compute
> > on buffer slices.
> >
> > The rust implementation currently allows creating slices of arrays
> > starting at arbitrary aligned offsets. This becomes a problem with
> > boolean arrays and with the null bitmaps, since operations on those
> > are currently working with whole bytes as the smallest unit. There
> > could be several options to solve this, all adding additional
> > complexity or having other downsides:
> >
> > - calculate null bitmaps bit by bit if not properly aligned, leading
> > to a big performance drop
> > - calculate null bitmaps on whole bytes and then try to rotate the
> > resulting buffer by a certain number of bits. quite complex code and
> > also some performance overhead
> > - disallow compute kernels on non-aligned buffers, at least if null
> > bitmaps are involved
> >
> > I'm leaning towards the last option, a draft PR is at
> > https://github.com/apache/arrow/pull/7854
> >
> > Another issue with offsets is that, at least in the rust
> > implementation, some simd kernels currently assume the whole buffer to
> > be aligned to 64 bytes. As soon as there is an offset that is not a
> > multiple of 64, this could lead to unsafe out of bounds reads and
> > writes of memory.
> >
> > I'm very interested in how the C++ and Java implementations handle those
> issues.
> >
> > --
> > Jörn Horstmann | Senior Backend Engineer
> >
> > www.signavio.com
> > Kurfürstenstraße 111, 10787 Berlin, Germany
>

Reply via email to