Thanks Steven, I've thought there is something more behind... I shall note that that I forgot to mention matrix dimensions, which is 1000 x 1000.
On Monday, 21 March 2016 10:48:33 UTC+1, Steven G. Johnson wrote: > > You need a lot more than just fast loops to match the performance of an > optimized BLAS. See e.g. this notebook for some comments on the related > case of matrix multiplication: > > > http://nbviewer.jupyter.org/url/math.mit.edu/~stevenj/18.335/Matrix-multiplication-experiments.ipynb >
