Kernel Matrix Calculation in Nim

mratsim Wed, 11 Nov 2020 11:45:08 -0800

Regarding SIMD, you have the same challenges as C or C++ in Nim.

You can use GCC/Clang builtin vector types for example
    
    
    {.emit:"typedef float Float32x8 __attribute__ ((vector_size (32)));".}
    
    type Float32x8 {.importc, bycopy.} = object
      raw: array[8, float32]
    
    
    Run


Or you can use the SIMD intrinsics.

But in terms of implementation you need the same effort as C or C++ except that 
Nim metaprogramming makes it possible to generalize your SIMD implementations 
to many architectures or SIMD size.

For example my SIMD definition for SSE2 and AVX512 mtrix multiplication which 
allows me, in thousand of lines of Nim code to be as fast as 50x more pure 
assembly lines in OpenBLAS

  * 
<https://github.com/numforge/laser/blob/d1e6ae6/laser/primitives/matrix_multiplication/gemm_ukernel_avx512.nim#L10-L74>
  * 
<https://github.com/numforge/laser/blob/d1e6ae6/laser/primitives/matrix_multiplication/gemm_ukernel_sse2.nim#L6-L129>



Note that this includes integer support that no scientific library has in an 
efficient manner (150x faster than Numpy or Julia on 1500x1500 matrices) and 
also it's easy to implement fallback to scalar code.

Kernel Matrix Calculation in Nim

Reply via email to