Re: [julia-users] Re: Julia port of DOP853 50-70x slower than Fortran

Ivar Nesje Wed, 18 Dec 2013 06:35:09 -0800

I did not intend to suggest that it would be good for the interface to hard 
code the function. I just noted that some overhead vanishes. There might be 
some metaprogramming tricks you can do to get your desired interface back.


The usual idea in Julia for returning an array, without allocation is the 
same as in C. You must allocate the array in the calling function and 
modify it inside the function. In juila the convention is that such 
functions end in ! and modify their first argument.


function gravity!(ret::Vector{Float64}, t::Float64, y::Vector{Float64}, 
mu::Float64)
    r = norm(y[1:3])
    ret[1:3] = y[4:6]
    ret[4:6] = -mu*y[1:3]/r/r/r
end


Unfortunately Julia does not support efficient slicing so I was unsure if 
you could gain anything from this approach, becuase you use the returned 
array on multiple occasions later.

kl. 15:09:18 UTC+1 onsdag 18. desember 2013 skrev Helge Eichhorn følgende:
>
> I have updated the 
> Gist<https://www.google.com/url?q=https%3A%2F%2Fgist.github.com%2Fhelgee%2F8019521&sa=D&sntz=1&usg=AFQjCNFFC_2GgoDiVMWwe6WpBpKmBv2W5w>
> .
>
>
> 2013/12/18 Helge Eichhorn <[email protected] <javascript:>>
>
>> @Ivar: Using a regular function is only possible in this special case, 
>> because generally the user must supply the callback function.
>>
>> @Tim: I think that I can also rule out type problems. At least the output 
>> of code_typed is clean of Unions. Also did I forget to mention that I 
>> already did a lot of profiling.
>>
>> What did help was re-writing the callback function like so:
>>
>> function gravity1(t::Float64, y::Vector{Float64}, mu::Float64)
>>     x, y, z, vx, vy, vz = y
>>     r = sqrt(x*x+y*y+z*z)
>>     r3 = r*r*r
>>     [vx, vy, vz, -mu*x/r3, -mu*y/r3, -mu*z/r3]
>> end
>>
>> and especially reducing memory allocations by writing the intermediary 
>> solutions in the core integrator to a single large work array. This feels 
>> worse style-wise since the function has now side effects, but the runtime 
>> is down to *1.7e-4s*.
>>
>> According to the sampling profiler the remaing bottleneck is the 
>> allocation of the array that is returned by the callback function. I was 
>> thinking about returning a tuple instead but that seems rather unpractical 
>> because the data will be written to the work array afterwards. Other ideas?
>>
>>
>> 2013/12/18 Tim Holy <[email protected] <javascript:>>
>>
>>> Also, I just added a new section
>>> http://docs.julialang.org/en/latest/manual/performance-tips/#tools
>>> that advertises the available tools for helping you diagnose performance
>>> problems.
>>>
>>> Without taking the time to look at your code, I'll just add that 
>>> whenever I
>>> see an orders-of-magnitude discrepancy between C/Fortran and Julia, my 
>>> first
>>> instinct is to suspect a type problem. The fact that a vectorized 
>>> version is a
>>> bit faster than one written with loops might also support this diagnosis.
>>>
>>> Best,
>>> --Tim
>>>
>>> On Wednesday, December 18, 2013 03:26:11 AM Ivar Nesje wrote:
>>> > My first suggestion to anyone trying to write fast Julia programs is to
>>> > read http://docs.julialang.org/en/latest/manual/performance-tips/, 
>>> those
>>> > are all good tips that I do not think will get obsolete when Julia
>>> > improves. It seems to me like you know those points.
>>> >
>>> > I think you get an important hint from the fact that devectorization 
>>> does
>>> > not matter. To me it seems like the current bottleneck is because you 
>>> use a
>>> > anonymous function instead of a regular function. When I replace "f(" 
>>> by
>>> > "gravity(" i get some improvement and then your devectorisation 
>>> attempts
>>> > makes significant difference. Further you might want to try to reduce 
>>> the
>>> > amount of memory allocated, but that seems to complicate your code 
>>> quite
>>> > much.
>>> >
>>> > My improvements reduces the timing as follows for 1000 iterations.
>>> > ivar@Ivar-ubuntu:~/tmp$ julia doptest.jl
>>> > elapsed time: 0.878398771 seconds (513399840 bytes allocated)
>>> > ivar@Ivar-ubuntu:~/tmp$ julia dopitest.jl
>>> > elapsed time: 0.16916126 seconds (122423840 bytes allocated)
>>> >
>>> > kl. 11:07:30 UTC+1 onsdag 18. desember 2013 skrev Helge Eichhorn 
>>> følgende:
>>> > > Hi,
>>> > >
>>> > > I spent the last few days porting the well known
>>> > > DOP853<http://www.unige.ch/~hairer/software.html>integrator to 
>>> Julia. The
>>> > > process was quite smooth and I have implemented the core 
>>> functionality.
>>> > > However when I run my reference case, a numerical solution of the 
>>> two-body
>>> > >
>>> > > problem, I get the following timings:
>>> > >    - *Fortran* (gfortran 4.8.2 no optimizations): *~1.7e-5s*
>>> > >    - *Julia* (master, looped): *~1.3e-3s*
>>> > >
>>> > >    - *Julia* (master, vectorized):
>>> > > *~1e-3s (!) *
>>> > >
>>> > > I have posted the Julia code and the Fortran reference in this
>>> > > Gist<https://gist.github.com/helgee/8019521>. The computationally
>>> > > expensive part seems to be contained in the *dopcore *or *dopcorevec
>>> > > *function, respectively. What I really do not understand is, why the
>>> > > vectorized expressions seem to run faster or rather what I am doing 
>>> wrong
>>> > > here.
>>> > >
>>> > > Any ideas or suggestions? Many thanks in advance!
>>> > > Helge
>>>
>>
>>
>

Re: [julia-users] Re: Julia port of DOP853 50-70x slower than Fortran

Reply via email to