Hi Wes,

Le 16/10/2018 à 14:05, Wes McKinney a écrit :
> hi folks,
> 
> I explored a bit the performance implications of using validity
> bitmaps (like the Arrow columnar format) vs. sentinel values (like
> NaN, INT32_MIN) for nulls:
> 
> http://wesmckinney.com/blog/bitmaps-vs-sentinel-values/
> 
> The vectorization results may be of interest to those implementing
> analytic functions targeting the Arrow memory format. There's probably
> some other optimizations that can be employed, too.

This is a nice write-up.  It may also possible to further speed up
things using explicit SIMD operations.

For the non-null case, it should be relatively doable, see e.g.
https://p12tic.github.io/libsimdpp/v2.2-dev/libsimdpp/w/int/reduce_add.html
or
https://p12tic.github.io/libsimdpp/v2.2-dev/libsimdpp/w/fp/reduce_add.html .

For the with-nulls case, it might be possible to do something with SIMD
masks, but I'm not competent to propose anything concrete :-)

Regards

Antoine.


> 
> Caveat: it's entirely possible I made some mistakes in my code. I
> checked the various implementations for correctness only, and did not
> dig too deeply beyond that.
> 
> - Wes
> 

Reply via email to