Your loop has a ton of unnecessary allocation. You can move xprime=ones(Nalal,Naa) outside the loop. Also, you are converting xprime to a vector at every iteration. You can also do this outside the loop.
After the changes, I get julia> include("test2.jl"); WARNING: redefining constant lib 3.726049 seconds (99.47 k allocations: 15.651 GB, 6.56% gc time) julia> include("test3.jl"); WARNING: redefining constant lib 2.352259 seconds (29.40 k allocations: 5.219 GB, 3.29% gc time) in test3 the performance function is function performance(NoIter::Int64) W_temp = Array(Float64,Nalal*Naa) W_temp2 = Array(Float64,Nalal,Naa) xprime=Array(Float64,Nalal,Naa) xprime=ones(Nalal,Naa) xprime = xprime[:] for banana=1:NoIter W_temp = spl(xprime) end W_temp2 = reshape(W_temp,Nalal,Naa) end I don't know if you have this in your original code though. Maybe it's just your example. I also have need for fast cubic splines because I do dynamic programming, though I don't think Dierckx is a bottleneck for me. I think the Interpolations package might soon have cubic splines on a non-uniform grid, and it's very fast. On Friday, January 29, 2016 at 8:02:53 PM UTC-5, pokerho...@googlemail.com wrote: > > Hi, > > my original problem is a dynammic programming problem in which I need to > interpolate the value function on an irregular grid using a cubic spline > method. I was translating my MATLAB code into Julia and used the Dierckx > package in Julia to do the interpolation (there weren't some many > alternatives that did spline on an irregular grid as far as I recall). In > MATLAB I use interp1. It gave exactly the same result but it was about 50 > times slower which puzzled me. So I made this ( > http://stackoverflow.com/questions/34766029/julia-vs-matlab-why-is-my-julia-code-so-slow) > > stackoverflow post. > > The post is messy and you don't need to read through it I think. The > bottom line was that the Dierckx package apparently calls some Fortran code > which seems to pretty old (and slow, and doesn't use multiple cores. Nobody > knows what exactly the interp1 is doing. My guess is that it's coded in > C?!). > > So I asked a friend of mine who knows a little bit of C and he was so kind > to help me out. He translated the interpolation method into C and made it > such that it uses multiple threads (I am working with 12 threads). He also > put it on github here (https://github.com/nuffe/mnspline). Equipped with > that, I went back to my original problem and reran it. The Julia code was > still 3 times slower which left me puzzled again. The interpolation itself > was much faster now than MATLAB's interp1 but somewhere on the way that > advantage was lost. Consider the following minimal working example > preserving the irregular grid of the original problem which highlights the > point I think (the only action is in the loop, the other stuff is just > generating the irregular grid): > > MATLAB: > > spacing=1.5; > Nxx = 300 ; > Naa = 350; > Nalal = 200; > sigma = 10 ; > NoIter = 10000; > > xx=NaN(Nxx,1); > xmin = 0.01; > xmax = 400; > xx(1) = xmin; > for i=2:Nxx > xx(i) = xx(i-1) + (xmax-xx(i-1))/((Nxx-i+1)^spacing); > end > > f_util = @(c) c.^(1-sigma)/(1-sigma); > W=NaN(Nxx,1); > W(:,1) = f_util(xx); > > W_temp = NaN(Nalal,Naa); > xprime = NaN(Nalal,Naa); > > tic > for banana=1:NoIter > % tic > xprime=ones(Nalal,Naa); > W_temp=interp1(xx,W(:,1),xprime,'spline'); > % toc > end > toc > > > Julia: > > include("mnspline.jl") > > spacing=1.5 > Nxx = 300 > Naa = 350 > Nalal = 200 > sigma = 10 > NoIter = 10000 > > xx=Array(Float64,Nxx) > xmin = 0.01 > xmax = 400 > xx[1] = xmin > for i=2:Nxx > xx[i] = xx[i-1] + (xmax-xx[i-1])/((Nxx-i+1)^spacing) > end > > f_util(c) = c.^(1-sigma)/(1-sigma) > W=Array(Float64,Nxx,1) > W[:,1] = f_util(xx) > > > spl = mnspline(xx,W[:,1]) > > function performance(NoIter::Int64) > W_temp = Array(Float64,Nalal*Naa) > W_temp2 = Array(Float64,Nalal,Naa) > xprime=Array(Float64,Nalal,Naa) > for banana=1:NoIter > xprime=ones(Nalal,Naa) > W_temp = spl(xprime[:]) > end > W_temp2 = reshape(W_temp,Nalal,Naa) > end > > @time performance() > > > In the end I want to have a matrix, that's why I do all this reshaping in > the Julia code. If I comment out the loop and just consider one iteration, > Julia does it in (I ran it several times, precompiling etc) > > 0.000565 seconds (13 allocations: 1.603 MB) > > MATLAB on the other hand: Elapsed time is 0.007864 seconds. > > The bottom line being that in Julia the code is much faster (more than 10 > times in this case), which should be since I use all my threads and the > method is written in C. However, if I don't comment out the loop and run the > code as posted above: > > Julia: > 3.205262 seconds (118.99 k allocations: 15.651 GB, 14.08% gc time) > > MATLAB: > Elapsed time is 4.514449 seconds. > > > If I run the whole loop apparently MATLAB catches up a lot. It is still > slower in this case. In the original problem though Julia was only about 3 > times faster within the loop and once I looped through MATLAB turned out be 3 > times faster. I am stuck here, does anybody have an idea what might be going? > / If I am doing something wrong in my Julia code? A hint what I could do > better? > Right now I am trying to parallelize also the loop. But that's obviously > unfair compared with MATLAB because you could also parallelize that (If you > buy that expensive toolbox). > > > > > > > > >