2016Q1: std.blas

Ilya Yaroshenko via Digitalmars-d-announce Sat, 26 Dec 2015 12:00:43 -0800

Hi,

I will write GEMM and GEMV families of BLAS for Phobos.


Goals:
 - code without assembler
 - code based on SIMD instructions
 - DMD/LDC/GDC support
 - kernel based architecture like OpenBLAS
 - 85-100% FLOPS comparing with OpenBLAS (100%)
 - tiny generic code comparing with OpenBLAS
 - ability to define user kernels
 - allocators support. GEMM requires small internal allocations.
 - @nogc nothrow pure template functions (depends on allocator)
 - optional multithreaded

- ability to work with `Slice` multidimensional arrays whenstride between elements in vector is greater than 1. In commonBLAS matrix strides between rows or columns always equals 1.


Implementation details:

LDC all : very generic D/LLVM IR kernels. AVX/2/512/neonsupport is out of the box.

DMD/GDC x86   : kernels for  8 XMM registers based on core.simd
DMD/GDC x86_64: kernels for 16 XMM registers based on core.simd

DMD/GDC other : generic kernels without SIMD instructions.AVX/2/512 support can be added in the future.


References:

[1] Anatomy of High-Performance Matrix Multiplication:http://www.cs.utexas.edu/users/pingali/CS378/2008sp/papers/gotoPaper.pdf

[2] OpenBLAS  https://github.com/xianyi/OpenBLAS

Happy New Year!

Ilya

2016Q1: std.blas

Reply via email to