The matrix-vector multiply in there will lose the benefit of BLAS in devectorization. This is one area where we ought to be better, since this code is best not devectorized (from a user's perspective).
On my mac, python is .27 seconds and julia 0.4 is .47 seconds. Python is perhaps not using a fast BLAS, since it is whatever came with pip. -viral On Thursday, January 21, 2016 at 4:22:52 PM UTC+5:30, Kristoffer Carlsson wrote: > > There is no need to annotate your function argument types so tightly, > unless you have a good reason for it. > > You will generate a lot of temporaries in your V = ... > > Rewrite it as a loop and it will be a lot faster. You could also take a > look at the Devectorize.jl package. > >
