Yesterday I did a very short test. Generally SIMD (Single instruction, multiple 
data) may work best with 16 byte aligned data and restrict modifier to indicate 
non overlapping data. But even without there is some SIMD available in Nim 
indeed.

First we make gcc output visible with &> output redirection and enable 
vectorization:
    
    
    cat nim.cfg
    path:"$projectdir"
    nimcache:"/tmp/$projectdir"
    gcc.options.speed = "-save-temps -pipe -march=native -O3 -ftree-vectorize 
-fopt-info-vec -fno-strict-aliasing &> gcc.log"
    

We may also specify "-fopt-info-vec-missed" to see where vectorization failed, 
but that will generate much noise for all the libs. "-march=native" is used to 
ensure optimization for current CPU, and "-save-temps" outputs assembler 
listings. Test with
    
    
    import random
    proc test =
      var a: array[128, int]
      for i in 0 .. random(128):
        a[i] = i
      echo a[7]
    
    test()
    
    
    
    cat gcc.log
    gcc: warning: -pipe ignored because -save-temps specified
    /tmp//home/stefan/simd/simd.c:59:8: note: loop vectorized
    
    cat simd.s
            call    random_99297_4293377359
            testq   %rax, %rax
            js      .L13
            leaq    -3(%rax), %rcx
            leaq    1(%rax), %rdi
            shrq    $2, %rcx
            addq    $1, %rcx
            cmpq    $3, %rax
            leaq    0(,%rcx,4), %rdx
            jle     .L14
            vmovdqa .LC1(%rip), %ymm0
            xorl    %esi, %esi
            vmovdqa .LC3(%rip), %ymm1
    

"vmovdqa" seems to be SIMD instructions. So even with a non fixed upper bound 
for the for loop it works. I don't know if there is any benefit in real life :)

Reply via email to