I have updated the Gist <https://gist.github.com/helgee/8019521>.
2013/12/18 Helge Eichhorn <[email protected]> > @Ivar: Using a regular function is only possible in this special case, > because generally the user must supply the callback function. > > @Tim: I think that I can also rule out type problems. At least the output > of code_typed is clean of Unions. Also did I forget to mention that I > already did a lot of profiling. > > What did help was re-writing the callback function like so: > > function gravity1(t::Float64, y::Vector{Float64}, mu::Float64) > x, y, z, vx, vy, vz = y > r = sqrt(x*x+y*y+z*z) > r3 = r*r*r > [vx, vy, vz, -mu*x/r3, -mu*y/r3, -mu*z/r3] > end > > and especially reducing memory allocations by writing the intermediary > solutions in the core integrator to a single large work array. This feels > worse style-wise since the function has now side effects, but the runtime > is down to *1.7e-4s*. > > According to the sampling profiler the remaing bottleneck is the > allocation of the array that is returned by the callback function. I was > thinking about returning a tuple instead but that seems rather unpractical > because the data will be written to the work array afterwards. Other ideas? > > > 2013/12/18 Tim Holy <[email protected]> > >> Also, I just added a new section >> http://docs.julialang.org/en/latest/manual/performance-tips/#tools >> that advertises the available tools for helping you diagnose performance >> problems. >> >> Without taking the time to look at your code, I'll just add that whenever >> I >> see an orders-of-magnitude discrepancy between C/Fortran and Julia, my >> first >> instinct is to suspect a type problem. The fact that a vectorized version >> is a >> bit faster than one written with loops might also support this diagnosis. >> >> Best, >> --Tim >> >> On Wednesday, December 18, 2013 03:26:11 AM Ivar Nesje wrote: >> > My first suggestion to anyone trying to write fast Julia programs is to >> > read http://docs.julialang.org/en/latest/manual/performance-tips/, >> those >> > are all good tips that I do not think will get obsolete when Julia >> > improves. It seems to me like you know those points. >> > >> > I think you get an important hint from the fact that devectorization >> does >> > not matter. To me it seems like the current bottleneck is because you >> use a >> > anonymous function instead of a regular function. When I replace "f(" by >> > "gravity(" i get some improvement and then your devectorisation attempts >> > makes significant difference. Further you might want to try to reduce >> the >> > amount of memory allocated, but that seems to complicate your code quite >> > much. >> > >> > My improvements reduces the timing as follows for 1000 iterations. >> > ivar@Ivar-ubuntu:~/tmp$ julia doptest.jl >> > elapsed time: 0.878398771 seconds (513399840 bytes allocated) >> > ivar@Ivar-ubuntu:~/tmp$ julia dopitest.jl >> > elapsed time: 0.16916126 seconds (122423840 bytes allocated) >> > >> > kl. 11:07:30 UTC+1 onsdag 18. desember 2013 skrev Helge Eichhorn >> følgende: >> > > Hi, >> > > >> > > I spent the last few days porting the well known >> > > DOP853<http://www.unige.ch/~hairer/software.html>integrator to >> Julia. The >> > > process was quite smooth and I have implemented the core >> functionality. >> > > However when I run my reference case, a numerical solution of the >> two-body >> > > >> > > problem, I get the following timings: >> > > - *Fortran* (gfortran 4.8.2 no optimizations): *~1.7e-5s* >> > > - *Julia* (master, looped): *~1.3e-3s* >> > > >> > > - *Julia* (master, vectorized): >> > > *~1e-3s (!) * >> > > >> > > I have posted the Julia code and the Fortran reference in this >> > > Gist<https://gist.github.com/helgee/8019521>. The computationally >> > > expensive part seems to be contained in the *dopcore *or *dopcorevec >> > > *function, respectively. What I really do not understand is, why the >> > > vectorized expressions seem to run faster or rather what I am doing >> wrong >> > > here. >> > > >> > > Any ideas or suggestions? Many thanks in advance! >> > > Helge >> > >
