On 3/21/2015 2:08 PM, "Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?=
<[email protected]>" wrote:
On Saturday, 21 March 2015 at 19:35:02 UTC, Walter Bright wrote:
I know I shouldn't, but I'll bite. Show me the "low level C code" that
effectively uses SIMD vector registers.
You are right, you should not bite. C code is superflous, this is a general
issue with efficient parallel computations. You want to avoid dependencies
within a single register.
E.g. Take a recurrence relation and make an efficient simd implementation for
it. You might need to try to expand the terms so you have N independent
formulas. If it uses floating point you will have to be careful about drift
between the N formulas that are computed in parallel.
I.e. there isn't low level C code that effectively uses SIMD vector registers.
You have to use the auto-vectorizer, which tries to reconstruct high level
operations out of C low level code, then recompile.