Hi Wes, I am interesting in this. In this PR [1] we are exposing BitmapWordReader/ Writer [2] to the outside, which may help the 'batch-at-a-time' scenario.
[1] https://github.com/apache/arrow/pull/10487 [2] https://github.com/apache/arrow/blob/bcce18e5d4d83f0831de71b363ad91470376084c/cpp/src/arrow/util/bitmap_reader.h#L149-L231 On Wed, Jun 23, 2021 at 11:21 AM Wes McKinney <wesmck...@gmail.com> wrote: > One project I was interested in getting to but haven't had the time > was introducing branch-free code into vector_selection.cc and reducing > the use of if-statements to try to improve performance. > > One way to do this is to take code that looks like this: > > if (BitUtil::GetBit(filter_data_, filter_offset_ + in_position)) { > BitUtil::SetBit(out_is_valid_, out_offset_ + out_position_); > out_data_[out_position_++] = values_data_[in_position]; > } > ++in_position; > > and change it to a branch-free version > > bool advance = BitUtil::GetBit(filter_data_, filter_offset_ + in_position); > BitUtil::SetBitTo(out_is_valid_, out_offset_ + out_position_, advance); > out_data_[out_position_] = values_data_[in_position]; > out_position_ += advance; // may need static_cast<int> here > ++in_position; > > Since more people are working on kernels and computing now, I thought > this might be an interesting project for someone to explore and see > what improvements are possible (and what the differences between e.g. > x86 and ARM architecture are like when it comes to reducing > branching). Another thing to look at might be batch-at-a-time > bitpacking in the output bitmap versus bit-at-a-time. > -- Niranda Perera https://niranda.dev/ @n1r44 <https://twitter.com/N1R44>