Your loop has a ton of unnecessary allocation. You can 
move xprime=ones(Nalal,Naa) outside the loop.
Also, you are converting xprime to a vector at every iteration. You can 
also do this outside the loop.

After the changes, I get

julia> include("test2.jl");
WARNING: redefining constant lib
  3.726049 seconds (99.47 k allocations: 15.651 GB, 6.56% gc time)

julia> include("test3.jl");
WARNING: redefining constant lib
  2.352259 seconds (29.40 k allocations: 5.219 GB, 3.29% gc time)

in test3 the performance function is 

function performance(NoIter::Int64)
    W_temp = Array(Float64,Nalal*Naa)
    W_temp2 = Array(Float64,Nalal,Naa)
    xprime=Array(Float64,Nalal,Naa)
    xprime=ones(Nalal,Naa)
    xprime = xprime[:]
    for banana=1:NoIter
        W_temp = spl(xprime)
    end
    W_temp2 = reshape(W_temp,Nalal,Naa)
end

I don't know if you have this in your original code though. Maybe it's just 
your example.

I also have need for fast cubic splines because I do dynamic programming, 
though I don't think Dierckx is a bottleneck for me. I think the 
Interpolations package might soon have cubic splines on a non-uniform grid, 
and it's very fast.


On Friday, January 29, 2016 at 8:02:53 PM UTC-5, pokerho...@googlemail.com 
wrote:
>
> Hi,
>
> my original problem is a dynammic programming problem in which I need to 
> interpolate the value function on an irregular grid using a cubic spline 
> method. I was translating my MATLAB code into Julia and used the Dierckx 
> package in Julia to do the interpolation (there weren't some many 
> alternatives that did spline on an irregular grid as far as I recall). In 
> MATLAB I use interp1. It gave exactly the same result but it was about 50 
> times slower which puzzled me. So I made this (
> http://stackoverflow.com/questions/34766029/julia-vs-matlab-why-is-my-julia-code-so-slow)
>  
> stackoverflow post. 
>
> The post is messy and you don't need to read through it I think. The 
> bottom line was that the Dierckx package apparently calls some Fortran code 
> which seems to pretty old (and slow, and doesn't use multiple cores. Nobody 
> knows what exactly the interp1 is doing. My guess is that it's coded in 
> C?!). 
>
> So I asked a friend of mine who knows a little bit of C and he was so kind 
> to help me out. He translated the interpolation method into C and made it 
> such that it uses multiple threads (I am working with 12 threads). He also 
> put it on github here (https://github.com/nuffe/mnspline). Equipped with 
> that, I went back to my original problem and reran it. The Julia code was 
> still 3 times slower which left me puzzled again. The interpolation itself 
> was much faster now than MATLAB's interp1 but somewhere on the way that 
> advantage was lost. Consider the following minimal working example 
> preserving the irregular grid of the original problem which highlights the 
> point I think (the only action is in the loop, the other stuff is just 
> generating the irregular grid):
>
> MATLAB:
>
> spacing=1.5;
> Nxx = 300 ;
> Naa = 350;
> Nalal = 200; 
> sigma = 10 ;
> NoIter = 10000; 
>
> xx=NaN(Nxx,1);
> xmin = 0.01;
> xmax = 400;
> xx(1) = xmin;
> for i=2:Nxx
>     xx(i) = xx(i-1) + (xmax-xx(i-1))/((Nxx-i+1)^spacing);
> end
>
> f_util =  @(c) c.^(1-sigma)/(1-sigma);
> W=NaN(Nxx,1);
> W(:,1) = f_util(xx);
>
> W_temp = NaN(Nalal,Naa);
> xprime = NaN(Nalal,Naa);
>
> tic
> for banana=1:NoIter
> %   tic
>   xprime=ones(Nalal,Naa);
>   W_temp=interp1(xx,W(:,1),xprime,'spline');
> %   toc
> end
> toc
>
>
> Julia:
>
> include("mnspline.jl")
>
> spacing=1.5
> Nxx = 300
> Naa = 350
> Nalal = 200
> sigma = 10
> NoIter = 10000
>
> xx=Array(Float64,Nxx)
> xmin = 0.01
> xmax = 400
> xx[1] = xmin
> for i=2:Nxx
>     xx[i] = xx[i-1] + (xmax-xx[i-1])/((Nxx-i+1)^spacing)
> end
>
> f_util(c) =  c.^(1-sigma)/(1-sigma)
> W=Array(Float64,Nxx,1)
> W[:,1] = f_util(xx)
>
>
> spl = mnspline(xx,W[:,1])
>
> function performance(NoIter::Int64)
>     W_temp = Array(Float64,Nalal*Naa)
>     W_temp2 = Array(Float64,Nalal,Naa)
>     xprime=Array(Float64,Nalal,Naa)
>     for banana=1:NoIter
>         xprime=ones(Nalal,Naa)
>         W_temp = spl(xprime[:])
>     end
>     W_temp2 = reshape(W_temp,Nalal,Naa)
> end
>
> @time performance()
>
>
> In the end I want to have a matrix, that's why I do all this reshaping in 
> the Julia code. If I comment out the loop and just consider one iteration, 
> Julia does it in  (I ran it several times, precompiling etc)
>
> 0.000565 seconds (13 allocations: 1.603 MB)
>
> MATLAB on the other hand: Elapsed time is 0.007864 seconds.
>
> The bottom line being that in Julia the code is much faster (more than 10 
> times in this case), which should be since I use all my threads and the 
> method is written in C. However, if I don't comment out the loop and run the 
> code as posted above:
>
> Julia:
> 3.205262 seconds (118.99 k allocations: 15.651 GB, 14.08% gc time)
>
> MATLAB:
> Elapsed time is 4.514449 seconds.
>
>
> If I run the whole loop apparently MATLAB catches up a lot. It is still 
> slower in this case. In the original problem though Julia was only about 3 
> times faster within the loop and once I looped through MATLAB turned out be 3 
> times faster. I am stuck here, does anybody have an idea what might be going? 
> / If I am doing something wrong in my Julia code? A hint what I could do 
> better?
> Right now I am trying to parallelize also the loop. But that's obviously 
> unfair compared with MATLAB because you could also parallelize that (If you 
> buy that expensive toolbox). 
>
>
>
>
>
>
>
>
>

Reply via email to