On Saturday, 21 March 2015 at 19:35:02 UTC, Walter Bright wrote:
I know I shouldn't, but I'll bite. Show me the "low level C code" that effectively uses SIMD vector registers.
You are right, you should not bite. C code is superflous, this is a general issue with efficient parallel computations. You want to avoid dependencies within a single register.
E.g. Take a recurrence relation and make an efficient simd implementation for it. You might need to try to expand the terms so you have N independent formulas. If it uses floating point you will have to be careful about drift between the N formulas that are computed in parallel.
