Regarding SIMD, you have the same challenges as C or C++ in Nim.
You can use GCC/Clang builtin vector types for example
{.emit:"typedef float Float32x8 __attribute__ ((vector_size (32)));".}
type Float32x8 {.importc, bycopy.} = object
raw: array[8, float32]
Run
Or you can use the SIMD intrinsics.
But in terms of implementation you need the same effort as C or C++ except that
Nim metaprogramming makes it possible to generalize your SIMD implementations
to many architectures or SIMD size.
For example my SIMD definition for SSE2 and AVX512 mtrix multiplication which
allows me, in thousand of lines of Nim code to be as fast as 50x more pure
assembly lines in OpenBLAS
*
<https://github.com/numforge/laser/blob/d1e6ae6/laser/primitives/matrix_multiplication/gemm_ukernel_avx512.nim#L10-L74>
*
<https://github.com/numforge/laser/blob/d1e6ae6/laser/primitives/matrix_multiplication/gemm_ukernel_sse2.nim#L6-L129>
Note that this includes integer support that no scientific library has in an
efficient manner (150x faster than Numpy or Julia on 1500x1500 matrices) and
also it's easy to implement fallback to scalar code.