Thanks Steven, I've thought there is something more behind...

I shall note that that I forgot to mention matrix dimensions, which is 1000 
x 1000.

On Monday, 21 March 2016 10:48:33 UTC+1, Steven G. Johnson wrote:
>
> You need a lot more than just fast loops to match the performance of an 
> optimized BLAS.    See e.g. this notebook for some comments on the related 
> case of matrix multiplication:
>
>
> http://nbviewer.jupyter.org/url/math.mit.edu/~stevenj/18.335/Matrix-multiplication-experiments.ipynb
>

Reply via email to