Thanks, that helped a bit, but I'm not fully there. For now the first one works: import nimsimd/avx2 when defined(gcc) or defined(clang): {.localPassc: "-mavx2".} proc test(vals: seq[float32])= let a = mm256_loadu_ps(vals[0].addr) b = mm256_set1_ps(-2.0) c = mm256_mul_ps(a, b) d = mm256_andnot_ps(mm256_set1_ps(-0.0), c) #abs echo cast[array[8, float32]](d) test(@[1.0.float32, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0]) Run
Instead of `mm256_load_ps(vals[0].addr)` I used `mm256_loadu_ps(vals[0].addr)` using the docment [crunching nubers with avx, section 4.2](https://www.codeproject.com/articles/874396/crunching-numbers-with-avx-and-avx) It speaks about memory alignment and confuses me. I thought it was aligned as Zevv writes in his article on [Nim's memory model](https://zevv.nl/nim-memory/) writes "This is caused by something the compiler does which is called alignment, to make it easier for the CPU to access the data in memory". The crunching numbers article then also says: "When loading data into vectors, memory alignment becomes particularly important. Each _mm256_load_* intrinsic accepts a memory address that must be aligned on a 32-byte boundary. That is, the address must be divisible by 32." followed by some C++ code. How to do that in nim? Or how to make sure a seq is aligned so `mm256_load_ps(vals[0].addr)` can be used? (I also noted that Zevvs examples now produce an other output from `repr` and the actual address is not there anymore)