On Sunday, 27 December 2015 at 10:28:53 UTC, Russel Winder wrote:
On Sat, 2015-12-26 at 19:57 +0000, Ilya Yaroshenko via Digitalmars-d- announce wrote:

I will write GEMM and GEMV families of BLAS for Phobos.

  - code without assembler
  - code based on SIMD instructions
  - DMD/LDC/GDC support
  - kernel based architecture like OpenBLAS
  - 85-100% FLOPS comparing with OpenBLAS (100%)
  - tiny generic code comparing with OpenBLAS
  - ability to define user kernels
  - allocators support. GEMM requires small internal allocations.   - @nogc nothrow pure template functions (depends on allocator)
  - optional multithreaded
  - ability to work with `Slice` multidimensional arrays when
stride between elements in vector is greater than 1. In common
BLAS matrix strides between rows or columns always equals 1.

Shouldn't to goal of a project like this be to be something that OpenBLAS isn't? Given D's ability to call C and C++ code, it is not clear to me that simply rewriting OpenBLAS in D has any goal for the D or BLAS communities per se. Doesn't stop it being a fun activity for the programmer, obviously, but unless there is something that isn't in OpenBLAS, I cannot see this ever being competition and so building a community around the project.

It depends on what you mean with "something like this". OpenBLAS is _huge_ amount of assembler code. For _each_ platform for _each_ CPU generation for _each_ floating point / complex type it would have a kernel or few kernels. It is 30 MB of assembler code.

Not only D code can call C/C++, but also C/C++ (and so any other language) can call D code. So std.blas may be used in C/C++ projects like Julia.

Now if the threads/OpenCL/CUDA was front and centre so that a goal was to be Nx faster than OpenBLAS, that could be a goal worth standing behind.

It can be goal for standalone project. But standard library should be portable on any platform without significant problems (especially without problems caused by matrix multiplication). So my goal is tiny and portable project like ATLAS, but fast like OpenBLAS. BTW, threads in std.blas would be optional like in OpenBLAS. Futhermore std.blas will allow a user to write his own kernels.

Not to mention full N-dimension vectors so that D could seriously compete against Numpy in the Python world.

I am not sure how D can compete against Numpy in the Python world, but it can compete Python in world of programming languages. BTW, N-dimension ranges/arrays/vectors already implemented for Phobos:


Updated Docs:

Please participate in voting (time constraints is extended) :-) http://forum.dlang.org/thread/nexiojzouxtawdwnl...@forum.dlang.org


Reply via email to