I noticed the commented out BLAS.gemm! and BLAS.axpy! lines: did these help?
-Simon On Thursday, 21 January 2016 11:12:53 UTC, Viral Shah wrote: > > The matrix-vector multiply in there will lose the benefit of BLAS in > devectorization. This is one area where we ought to be better, since this > code is best not devectorized (from a user's perspective). > > On my mac, python is .27 seconds and julia 0.4 is .47 seconds. Python is > perhaps not using a fast BLAS, since it is whatever came with pip. > > -viral > > On Thursday, January 21, 2016 at 4:22:52 PM UTC+5:30, Kristoffer Carlsson > wrote: >> >> There is no need to annotate your function argument types so tightly, >> unless you have a good reason for it. >> >> You will generate a lot of temporaries in your V = ... >> >> Rewrite it as a loop and it will be a lot faster. You could also take a >> look at the Devectorize.jl package. >> >>
