A few thoughts on this as a high level: 1. Most of the libraries don't support runtime dispatch (libsimdpp seems to be the exception here), so we should decide if we want to roll our own dynamic dispatch mechanism. 2. It isn't clear to me in the linked PR if the performance delta between SIMD generated code and what the compiler would generate. For simple aggregates of non-null data I would expect pretty good auto-vectorization. Compiler auto-vectorization seems to get better over time. For instance the scalar example linked in the paper seems to get vectorized somewhat under Clang 10 (https://godbolt.org/z/oPopQL). 3. It appears there are some efforts to make a standardized C++ library [1] which might be based on Vc.
My initial thought on this is that in the short-term would be to focus on the dynamic dispatch question (continue to build our own vs adopt an existing library) and lean the compiler for most vectorization. Using intrinsics should be limited to complex numerical functions and places where the compiler fails to vectorize/translate well (e.g. bit manipulations). If we do find the need for a dedicated library I would lean towards something that will converge to a standard to reduce additional dependencies in the long run. That being said most of these libraries seem to be header only so the dependency is fairly light-weight, so we can vendor them if need-be. [1] https://en.cppreference.com/w/cpp/experimental/simd On Tue, Jun 9, 2020 at 3:32 AM Antoine Pitrou <anto...@python.org> wrote: > > Thank you. xsimd used to require C++14, but apparently they have > demoted it to C++11. Good! > > Regards > > Antoine. > > > Le 09/06/2020 à 12:04, Maarten Breddels a écrit : > > Hi Antoine, > > > > Adding xsimd to the list of options: > > * https://github.com/xtensor-stack/xsimd > > Not sure how it compares to the rest though. > > > > cheers, > > > > Maarten > > >