Since your vectors are of fixed size, you may want to consider using tuples. Of course, since tuples aren't mutable, that may require refactoring a bit.
On Wed, Dec 18, 2013 at 9:09 AM, Helge Eichhorn <[email protected]>wrote: > I have updated the Gist <https://gist.github.com/helgee/8019521>. > > > 2013/12/18 Helge Eichhorn <[email protected]> > >> @Ivar: Using a regular function is only possible in this special case, >> because generally the user must supply the callback function. >> >> @Tim: I think that I can also rule out type problems. At least the output >> of code_typed is clean of Unions. Also did I forget to mention that I >> already did a lot of profiling. >> >> What did help was re-writing the callback function like so: >> >> function gravity1(t::Float64, y::Vector{Float64}, mu::Float64) >> x, y, z, vx, vy, vz = y >> r = sqrt(x*x+y*y+z*z) >> r3 = r*r*r >> [vx, vy, vz, -mu*x/r3, -mu*y/r3, -mu*z/r3] >> end >> >> and especially reducing memory allocations by writing the intermediary >> solutions in the core integrator to a single large work array. This feels >> worse style-wise since the function has now side effects, but the runtime >> is down to *1.7e-4s*. >> >> According to the sampling profiler the remaing bottleneck is the >> allocation of the array that is returned by the callback function. I was >> thinking about returning a tuple instead but that seems rather unpractical >> because the data will be written to the work array afterwards. Other ideas? >> >> >> 2013/12/18 Tim Holy <[email protected]> >> >>> Also, I just added a new section >>> http://docs.julialang.org/en/latest/manual/performance-tips/#tools >>> that advertises the available tools for helping you diagnose performance >>> problems. >>> >>> Without taking the time to look at your code, I'll just add that >>> whenever I >>> see an orders-of-magnitude discrepancy between C/Fortran and Julia, my >>> first >>> instinct is to suspect a type problem. The fact that a vectorized >>> version is a >>> bit faster than one written with loops might also support this diagnosis. >>> >>> Best, >>> --Tim >>> >>> On Wednesday, December 18, 2013 03:26:11 AM Ivar Nesje wrote: >>> > My first suggestion to anyone trying to write fast Julia programs is to >>> > read http://docs.julialang.org/en/latest/manual/performance-tips/, >>> those >>> > are all good tips that I do not think will get obsolete when Julia >>> > improves. It seems to me like you know those points. >>> > >>> > I think you get an important hint from the fact that devectorization >>> does >>> > not matter. To me it seems like the current bottleneck is because you >>> use a >>> > anonymous function instead of a regular function. When I replace "f(" >>> by >>> > "gravity(" i get some improvement and then your devectorisation >>> attempts >>> > makes significant difference. Further you might want to try to reduce >>> the >>> > amount of memory allocated, but that seems to complicate your code >>> quite >>> > much. >>> > >>> > My improvements reduces the timing as follows for 1000 iterations. >>> > ivar@Ivar-ubuntu:~/tmp$ julia doptest.jl >>> > elapsed time: 0.878398771 seconds (513399840 bytes allocated) >>> > ivar@Ivar-ubuntu:~/tmp$ julia dopitest.jl >>> > elapsed time: 0.16916126 seconds (122423840 bytes allocated) >>> > >>> > kl. 11:07:30 UTC+1 onsdag 18. desember 2013 skrev Helge Eichhorn >>> følgende: >>> > > Hi, >>> > > >>> > > I spent the last few days porting the well known >>> > > DOP853<http://www.unige.ch/~hairer/software.html>integrator to >>> Julia. The >>> > > process was quite smooth and I have implemented the core >>> functionality. >>> > > However when I run my reference case, a numerical solution of the >>> two-body >>> > > >>> > > problem, I get the following timings: >>> > > - *Fortran* (gfortran 4.8.2 no optimizations): *~1.7e-5s* >>> > > - *Julia* (master, looped): *~1.3e-3s* >>> > > >>> > > - *Julia* (master, vectorized): >>> > > *~1e-3s (!) * >>> > > >>> > > I have posted the Julia code and the Fortran reference in this >>> > > Gist<https://gist.github.com/helgee/8019521>. The computationally >>> > > expensive part seems to be contained in the *dopcore *or *dopcorevec >>> > > *function, respectively. What I really do not understand is, why the >>> > > vectorized expressions seem to run faster or rather what I am doing >>> wrong >>> > > here. >>> > > >>> > > Any ideas or suggestions? Many thanks in advance! >>> > > Helge >>> >> >> >
