On Sunday, 22 March 2015 at 03:43:33 UTC, Walter Bright wrote:
I.e. there isn't low level C code that effectively uses SIMD vector registers. You have to use the auto-vectorizer, which tries to reconstruct high level operations out of C low level code, then recompile.
I don't think low level hardware registers qualify as "high level constructs" which is the term you used. Besides, all major C compilers ship with builtin vector types and support for standard hardware vendor SIMD intrinsics. But even if you dismiss that, then even less sophisticated contemporary compiler is capable of using SIMD for carefully manually unrolled expressions. Still, even without explicit simd instructions the superscalar nature of desktop CPUs require you to break dependencies to avoid bubbles in the pipeline.
So in order to optimize the filling of an array with the fibonacci sequence a plain high level library generator is insufficient. You also need to utilize the closed formula for fib(x) so that you can generate sequences in parallel, e.g. compute the sequence fib(0),fib(1)… in parallel with fib(N), fib(N+1) etc.
Without having the closed formula to obtain fib(N-2) and fib(N-1) a regular optimizer will simply not be able to break the dependencies as effectively as a handwritten low level loop.
