The OpenBLAS framework is described in http://dl.acm.org/citation.cfm?id=2503219 and builds on top of GotoBLAS, http://dl.acm.org/citation.cfm?id=1377607
Jutho: Part of the conclusion from your link is that you cannot write a fast matrix multiplication in C, but that you'll have to write it in assembly to avoid that a compiler destroys the performance. 2015-07-08 11:50 GMT-04:00 Jutho <[email protected]>: > I found a rather instructive tutorial about the kind of optimisations > going into matrix multiplication in BLIS (not OpenBLAS but related) here: > http://apfel.mathematik.uni-ulm.de/%7Elehn/sghpc/gemm/index.html > > It's not something one can implement in Julia (yet). Hopefully further > work in the direction of vectorisation of tuples will help (e.g. issue > #11899 and related)...
