Yesterday I did a very short test. Generally SIMD (Single instruction, multiple
data) may work best with 16 byte aligned data and restrict modifier to indicate
non overlapping data. But even without there is some SIMD available in Nim
indeed.
First we make gcc output visible with &> output redirection and enable
vectorization:
cat nim.cfg
path:"$projectdir"
nimcache:"/tmp/$projectdir"
gcc.options.speed = "-save-temps -pipe -march=native -O3 -ftree-vectorize
-fopt-info-vec -fno-strict-aliasing &> gcc.log"
We may also specify "-fopt-info-vec-missed" to see where vectorization failed,
but that will generate much noise for all the libs. "-march=native" is used to
ensure optimization for current CPU, and "-save-temps" outputs assembler
listings. Test with
import random
proc test =
var a: array[128, int]
for i in 0 .. random(128):
a[i] = i
echo a[7]
test()
cat gcc.log
gcc: warning: -pipe ignored because -save-temps specified
/tmp//home/stefan/simd/simd.c:59:8: note: loop vectorized
cat simd.s
call random_99297_4293377359
testq %rax, %rax
js .L13
leaq -3(%rax), %rcx
leaq 1(%rax), %rdi
shrq $2, %rcx
addq $1, %rcx
cmpq $3, %rax
leaq 0(,%rcx,4), %rdx
jle .L14
vmovdqa .LC1(%rip), %ymm0
xorl %esi, %esi
vmovdqa .LC3(%rip), %ymm1
"vmovdqa" seems to be SIMD instructions. So even with a non fixed upper bound
for the for loop it works. I don't know if there is any benefit in real life :)