Also for alignment requirements in C++ we sometimes process unaligned leading/trailing data to reach the required alignment.
On Monday, August 3, 2020, Wes McKinney <wesmck...@gmail.com> wrote: > We handle arbitrary slices in C++ and it hasn't seemed especially > burdensome. We have some utilities (e.g. see > arrow/util/bit_block_counter.h) to facilitate efficiently iterating > through bitmaps 64 bits at a time (even when slices on an unaligned > offset) and associated bit-by-bit iterators (e.g. BitmapReader, > BitmapWriter) > > On Mon, Aug 3, 2020 at 7:38 AM Jörn Horstmann > <joern.horstm...@signavio.com> wrote: > > > > While investigating an issue regarding offset handling in the rust > > arithmetic kernels (https://issues.apache.org/jira/browse/ARROW-9583), > > I started to wonder how the other implementations are handling compute > > on buffer slices. > > > > The rust implementation currently allows creating slices of arrays > > starting at arbitrary aligned offsets. This becomes a problem with > > boolean arrays and with the null bitmaps, since operations on those > > are currently working with whole bytes as the smallest unit. There > > could be several options to solve this, all adding additional > > complexity or having other downsides: > > > > - calculate null bitmaps bit by bit if not properly aligned, leading > > to a big performance drop > > - calculate null bitmaps on whole bytes and then try to rotate the > > resulting buffer by a certain number of bits. quite complex code and > > also some performance overhead > > - disallow compute kernels on non-aligned buffers, at least if null > > bitmaps are involved > > > > I'm leaning towards the last option, a draft PR is at > > https://github.com/apache/arrow/pull/7854 > > > > Another issue with offsets is that, at least in the rust > > implementation, some simd kernels currently assume the whole buffer to > > be aligned to 64 bytes. As soon as there is an offset that is not a > > multiple of 64, this could lead to unsafe out of bounds reads and > > writes of memory. > > > > I'm very interested in how the C++ and Java implementations handle those > issues. > > > > -- > > Jörn Horstmann | Senior Backend Engineer > > > > www.signavio.com > > Kurfürstenstraße 111, 10787 Berlin, Germany >