Eelco Hoogendoorn <[email protected]> wrote:

> I wonder: how hard would it be to create a more 21th-century oriented BLAS,
> relying more on code generation tools, and perhaps LLVM/JITting?
> 
> Wouldn't we get ten times the portability with one-tenth the lines of code?
> Or is there too much dark magic going on in BLAS for such an approach to
> come close enough to hand-tuned performance?

The "dark magic" in OpenBLAS is mostly to place prefetch instructions
strategically, to make sure hierarchical memory is used optimally. This is
very hard for the compiler to get correctly, because it doesn't know matrix
algebra like we do. The reason prefetching is needed, is because when two
matrices are multiplied, one of them will have strided memory access. On
the other hand, putting in other SIMD instructions than _mm_prefetch is
something a compiler might be able to vectorize without a lot of help
today.

Sturla

_______________________________________________
NumPy-Discussion mailing list
[email protected]
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Reply via email to