Hi Wes,
Le 16/10/2018 à 14:05, Wes McKinney a écrit : > hi folks, > > I explored a bit the performance implications of using validity > bitmaps (like the Arrow columnar format) vs. sentinel values (like > NaN, INT32_MIN) for nulls: > > http://wesmckinney.com/blog/bitmaps-vs-sentinel-values/ > > The vectorization results may be of interest to those implementing > analytic functions targeting the Arrow memory format. There's probably > some other optimizations that can be employed, too. This is a nice write-up. It may also possible to further speed up things using explicit SIMD operations. For the non-null case, it should be relatively doable, see e.g. https://p12tic.github.io/libsimdpp/v2.2-dev/libsimdpp/w/int/reduce_add.html or https://p12tic.github.io/libsimdpp/v2.2-dev/libsimdpp/w/fp/reduce_add.html . For the with-nulls case, it might be possible to do something with SIMD masks, but I'm not competent to propose anything concrete :-) Regards Antoine. > > Caveat: it's entirely possible I made some mistakes in my code. I > checked the various implementations for correctness only, and did not > dig too deeply beyond that. > > - Wes >