nim simd (avx2) How to get going?

ingo Sun, 28 Jul 2024 00:30:35 -0700

Thanks, that helped a bit, but I'm not fully there. For now the first one works:
    
    
    import nimsimd/avx2
    
    when defined(gcc) or defined(clang):
      {.localPassc: "-mavx2".}
    
    
    proc test(vals: seq[float32])=
      let
        a = mm256_loadu_ps(vals[0].addr)
        b = mm256_set1_ps(-2.0)
        c = mm256_mul_ps(a, b)
        d = mm256_andnot_ps(mm256_set1_ps(-0.0), c) #abs
      echo cast[array[8, float32]](d)
    
    test(@[1.0.float32, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0])
    
    
    Run


Instead of `mm256_load_ps(vals[0].addr)` I used `mm256_loadu_ps(vals[0].addr)` 
using the docment [crunching nubers with avx, section 
4.2](https://www.codeproject.com/articles/874396/crunching-numbers-with-avx-and-avx)
 It speaks about memory alignment and confuses me. I thought it was aligned as 
Zevv writes in his article on [Nim's memory model](https://zevv.nl/nim-memory/) 
writes "This is caused by something the compiler does which is called 
alignment, to make it easier for the CPU to access the data in memory".

The crunching numbers article then also says: "When loading data into vectors, 
memory alignment becomes particularly important. Each _mm256_load_* intrinsic 
accepts a memory address that must be aligned on a 32-byte boundary. That is, 
the address must be divisible by 32." followed by some C++ code. How to do that 
in nim? Or how to make sure a seq is aligned so `mm256_load_ps(vals[0].addr)` 
can be used?

(I also noted that Zevvs examples now produce an other output from `repr` and 
the actual address is not there anymore)

nim simd (avx2) How to get going?

Reply via email to