On Monday, 18 April 2016 at 00:27:06 UTC, Joe Duarte wrote:
On Tuesday, 5 April 2016 at 10:27:46 UTC, Walter Bright wrote:
Besides, I think it's a poor design to customize the app for
only one SIMD type. A better idea (I've repeated this ad
nauseum over the years) is to have n modules, one for each
supported SIMD type. Compile and link all of them in, then
detect the SIMD type at runtime and call the corresponding
module. (This is how the D array ops are currently
implemented.)
There are many organizations in the world that are building
software in-house, where such software is targeted to modern
CPU SIMD types, most typically AVX/AVX2 and crypto instructions.
In addition it's COMPILER work, not programmer!
Compiler SHOULD be able to vectorize the code using SSE/AVX
depending on command line switch. Why i should write all these
merde ? Let compiler do its work.
Also compiler CAN generate multiple versions of one function
using different SIMD instructions : Intel C++ Compiler works this
way : it generates a few versions of a function and checks at
run-time CPU capabilities and executes the fastest one.