@Ivar: Using a regular function is only possible in this special case,
because generally the user must supply the callback function.

@Tim: I think that I can also rule out type problems. At least the output
of code_typed is clean of Unions. Also did I forget to mention that I
already did a lot of profiling.

What did help was re-writing the callback function like so:

function gravity1(t::Float64, y::Vector{Float64}, mu::Float64)
    x, y, z, vx, vy, vz = y
    r = sqrt(x*x+y*y+z*z)
    r3 = r*r*r
    [vx, vy, vz, -mu*x/r3, -mu*y/r3, -mu*z/r3]
end

and especially reducing memory allocations by writing the intermediary
solutions in the core integrator to a single large work array. This feels
worse style-wise since the function has now side effects, but the runtime
is down to *1.7e-4s*.

According to the sampling profiler the remaing bottleneck is the allocation
of the array that is returned by the callback function. I was thinking
about returning a tuple instead but that seems rather unpractical because
the data will be written to the work array afterwards. Other ideas?


2013/12/18 Tim Holy <[email protected]>

> Also, I just added a new section
> http://docs.julialang.org/en/latest/manual/performance-tips/#tools
> that advertises the available tools for helping you diagnose performance
> problems.
>
> Without taking the time to look at your code, I'll just add that whenever I
> see an orders-of-magnitude discrepancy between C/Fortran and Julia, my
> first
> instinct is to suspect a type problem. The fact that a vectorized version
> is a
> bit faster than one written with loops might also support this diagnosis.
>
> Best,
> --Tim
>
> On Wednesday, December 18, 2013 03:26:11 AM Ivar Nesje wrote:
> > My first suggestion to anyone trying to write fast Julia programs is to
> > read http://docs.julialang.org/en/latest/manual/performance-tips/, those
> > are all good tips that I do not think will get obsolete when Julia
> > improves. It seems to me like you know those points.
> >
> > I think you get an important hint from the fact that devectorization does
> > not matter. To me it seems like the current bottleneck is because you
> use a
> > anonymous function instead of a regular function. When I replace "f(" by
> > "gravity(" i get some improvement and then your devectorisation attempts
> > makes significant difference. Further you might want to try to reduce the
> > amount of memory allocated, but that seems to complicate your code quite
> > much.
> >
> > My improvements reduces the timing as follows for 1000 iterations.
> > ivar@Ivar-ubuntu:~/tmp$ julia doptest.jl
> > elapsed time: 0.878398771 seconds (513399840 bytes allocated)
> > ivar@Ivar-ubuntu:~/tmp$ julia dopitest.jl
> > elapsed time: 0.16916126 seconds (122423840 bytes allocated)
> >
> > kl. 11:07:30 UTC+1 onsdag 18. desember 2013 skrev Helge Eichhorn
> følgende:
> > > Hi,
> > >
> > > I spent the last few days porting the well known
> > > DOP853<http://www.unige.ch/~hairer/software.html>integrator to Julia.
> The
> > > process was quite smooth and I have implemented the core functionality.
> > > However when I run my reference case, a numerical solution of the
> two-body
> > >
> > > problem, I get the following timings:
> > >    - *Fortran* (gfortran 4.8.2 no optimizations): *~1.7e-5s*
> > >    - *Julia* (master, looped): *~1.3e-3s*
> > >
> > >    - *Julia* (master, vectorized):
> > > *~1e-3s (!) *
> > >
> > > I have posted the Julia code and the Fortran reference in this
> > > Gist<https://gist.github.com/helgee/8019521>. The computationally
> > > expensive part seems to be contained in the *dopcore *or *dopcorevec
> > > *function, respectively. What I really do not understand is, why the
> > > vectorized expressions seem to run faster or rather what I am doing
> wrong
> > > here.
> > >
> > > Any ideas or suggestions? Many thanks in advance!
> > > Helge
>

Reply via email to