Since your vectors are of fixed size, you may want to consider using
tuples. Of course, since tuples aren't mutable, that may require
refactoring a bit.


On Wed, Dec 18, 2013 at 9:09 AM, Helge Eichhorn <[email protected]>wrote:

> I have updated the Gist <https://gist.github.com/helgee/8019521>.
>
>
> 2013/12/18 Helge Eichhorn <[email protected]>
>
>> @Ivar: Using a regular function is only possible in this special case,
>> because generally the user must supply the callback function.
>>
>> @Tim: I think that I can also rule out type problems. At least the output
>> of code_typed is clean of Unions. Also did I forget to mention that I
>> already did a lot of profiling.
>>
>> What did help was re-writing the callback function like so:
>>
>> function gravity1(t::Float64, y::Vector{Float64}, mu::Float64)
>>     x, y, z, vx, vy, vz = y
>>     r = sqrt(x*x+y*y+z*z)
>>     r3 = r*r*r
>>     [vx, vy, vz, -mu*x/r3, -mu*y/r3, -mu*z/r3]
>> end
>>
>> and especially reducing memory allocations by writing the intermediary
>> solutions in the core integrator to a single large work array. This feels
>> worse style-wise since the function has now side effects, but the runtime
>> is down to *1.7e-4s*.
>>
>> According to the sampling profiler the remaing bottleneck is the
>> allocation of the array that is returned by the callback function. I was
>> thinking about returning a tuple instead but that seems rather unpractical
>> because the data will be written to the work array afterwards. Other ideas?
>>
>>
>> 2013/12/18 Tim Holy <[email protected]>
>>
>>> Also, I just added a new section
>>> http://docs.julialang.org/en/latest/manual/performance-tips/#tools
>>> that advertises the available tools for helping you diagnose performance
>>> problems.
>>>
>>> Without taking the time to look at your code, I'll just add that
>>> whenever I
>>> see an orders-of-magnitude discrepancy between C/Fortran and Julia, my
>>> first
>>> instinct is to suspect a type problem. The fact that a vectorized
>>> version is a
>>> bit faster than one written with loops might also support this diagnosis.
>>>
>>> Best,
>>> --Tim
>>>
>>> On Wednesday, December 18, 2013 03:26:11 AM Ivar Nesje wrote:
>>> > My first suggestion to anyone trying to write fast Julia programs is to
>>> > read http://docs.julialang.org/en/latest/manual/performance-tips/,
>>> those
>>> > are all good tips that I do not think will get obsolete when Julia
>>> > improves. It seems to me like you know those points.
>>> >
>>> > I think you get an important hint from the fact that devectorization
>>> does
>>> > not matter. To me it seems like the current bottleneck is because you
>>> use a
>>> > anonymous function instead of a regular function. When I replace "f("
>>> by
>>> > "gravity(" i get some improvement and then your devectorisation
>>> attempts
>>> > makes significant difference. Further you might want to try to reduce
>>> the
>>> > amount of memory allocated, but that seems to complicate your code
>>> quite
>>> > much.
>>> >
>>> > My improvements reduces the timing as follows for 1000 iterations.
>>> > ivar@Ivar-ubuntu:~/tmp$ julia doptest.jl
>>> > elapsed time: 0.878398771 seconds (513399840 bytes allocated)
>>> > ivar@Ivar-ubuntu:~/tmp$ julia dopitest.jl
>>> > elapsed time: 0.16916126 seconds (122423840 bytes allocated)
>>> >
>>> > kl. 11:07:30 UTC+1 onsdag 18. desember 2013 skrev Helge Eichhorn
>>> følgende:
>>> > > Hi,
>>> > >
>>> > > I spent the last few days porting the well known
>>> > > DOP853<http://www.unige.ch/~hairer/software.html>integrator to
>>> Julia. The
>>> > > process was quite smooth and I have implemented the core
>>> functionality.
>>> > > However when I run my reference case, a numerical solution of the
>>> two-body
>>> > >
>>> > > problem, I get the following timings:
>>> > >    - *Fortran* (gfortran 4.8.2 no optimizations): *~1.7e-5s*
>>> > >    - *Julia* (master, looped): *~1.3e-3s*
>>> > >
>>> > >    - *Julia* (master, vectorized):
>>> > > *~1e-3s (!) *
>>> > >
>>> > > I have posted the Julia code and the Fortran reference in this
>>> > > Gist<https://gist.github.com/helgee/8019521>. The computationally
>>> > > expensive part seems to be contained in the *dopcore *or *dopcorevec
>>> > > *function, respectively. What I really do not understand is, why the
>>> > > vectorized expressions seem to run faster or rather what I am doing
>>> wrong
>>> > > here.
>>> > >
>>> > > Any ideas or suggestions? Many thanks in advance!
>>> > > Helge
>>>
>>
>>
>

Reply via email to