Thanks Ivar and Tom, that did the trick, and the performance is in reach of Fortran:
*9.5e-5s* I will implement the missing functionality and do some cleanup so this can go into ODE.jl. Thanks for the help everybody! 2013/12/18 Ivar Nesje <[email protected]> > I did not intend to suggest that it would be good for the interface to > hard code the function. I just noted that some overhead vanishes. There > might be some metaprogramming tricks you can do to get your desired > interface back. > > The usual idea in Julia for returning an array, without allocation is the > same as in C. You must allocate the array in the calling function and > modify it inside the function. In juila the convention is that such > functions end in ! and modify their first argument. > > > function gravity!(ret::Vector{Float64}, t::Float64, y::Vector{Float64}, > mu::Float64) > r = norm(y[1:3]) > ret[1:3] = y[4:6] > ret[4:6] = -mu*y[1:3]/r/r/r > end > > > Unfortunately Julia does not support efficient slicing so I was unsure if > you could gain anything from this approach, becuase you use the returned > array on multiple occasions later. > > kl. 15:09:18 UTC+1 onsdag 18. desember 2013 skrev Helge Eichhorn følgende: >> >> I have updated the >> Gist<https://www.google.com/url?q=https%3A%2F%2Fgist.github.com%2Fhelgee%2F8019521&sa=D&sntz=1&usg=AFQjCNFFC_2GgoDiVMWwe6WpBpKmBv2W5w> >> . >> >> >> 2013/12/18 Helge Eichhorn <[email protected]> >> >>> @Ivar: Using a regular function is only possible in this special case, >>> because generally the user must supply the callback function. >>> >>> @Tim: I think that I can also rule out type problems. At least the >>> output of code_typed is clean of Unions. Also did I forget to mention that >>> I already did a lot of profiling. >>> >>> What did help was re-writing the callback function like so: >>> >>> function gravity1(t::Float64, y::Vector{Float64}, mu::Float64) >>> x, y, z, vx, vy, vz = y >>> r = sqrt(x*x+y*y+z*z) >>> r3 = r*r*r >>> [vx, vy, vz, -mu*x/r3, -mu*y/r3, -mu*z/r3] >>> end >>> >>> and especially reducing memory allocations by writing the intermediary >>> solutions in the core integrator to a single large work array. This feels >>> worse style-wise since the function has now side effects, but the runtime >>> is down to *1.7e-4s*. >>> >>> According to the sampling profiler the remaing bottleneck is the >>> allocation of the array that is returned by the callback function. I was >>> thinking about returning a tuple instead but that seems rather unpractical >>> because the data will be written to the work array afterwards. Other ideas? >>> >>> >>> 2013/12/18 Tim Holy <[email protected]> >>> >>> Also, I just added a new section >>>> http://docs.julialang.org/en/latest/manual/performance-tips/#tools >>>> that advertises the available tools for helping you diagnose performance >>>> problems. >>>> >>>> Without taking the time to look at your code, I'll just add that >>>> whenever I >>>> see an orders-of-magnitude discrepancy between C/Fortran and Julia, my >>>> first >>>> instinct is to suspect a type problem. The fact that a vectorized >>>> version is a >>>> bit faster than one written with loops might also support this >>>> diagnosis. >>>> >>>> Best, >>>> --Tim >>>> >>>> On Wednesday, December 18, 2013 03:26:11 AM Ivar Nesje wrote: >>>> > My first suggestion to anyone trying to write fast Julia programs is >>>> to >>>> > read http://docs.julialang.org/en/latest/manual/performance-tips/, >>>> those >>>> > are all good tips that I do not think will get obsolete when Julia >>>> > improves. It seems to me like you know those points. >>>> > >>>> > I think you get an important hint from the fact that devectorization >>>> does >>>> > not matter. To me it seems like the current bottleneck is because you >>>> use a >>>> > anonymous function instead of a regular function. When I replace "f(" >>>> by >>>> > "gravity(" i get some improvement and then your devectorisation >>>> attempts >>>> > makes significant difference. Further you might want to try to reduce >>>> the >>>> > amount of memory allocated, but that seems to complicate your code >>>> quite >>>> > much. >>>> > >>>> > My improvements reduces the timing as follows for 1000 iterations. >>>> > ivar@Ivar-ubuntu:~/tmp$ julia doptest.jl >>>> > elapsed time: 0.878398771 seconds (513399840 bytes allocated) >>>> > ivar@Ivar-ubuntu:~/tmp$ julia dopitest.jl >>>> > elapsed time: 0.16916126 seconds (122423840 bytes allocated) >>>> > >>>> > kl. 11:07:30 UTC+1 onsdag 18. desember 2013 skrev Helge Eichhorn >>>> følgende: >>>> > > Hi, >>>> > > >>>> > > I spent the last few days porting the well known >>>> > > DOP853<http://www.unige.ch/~hairer/software.html>integrator to >>>> Julia. The >>>> > > process was quite smooth and I have implemented the core >>>> functionality. >>>> > > However when I run my reference case, a numerical solution of the >>>> two-body >>>> > > >>>> > > problem, I get the following timings: >>>> > > - *Fortran* (gfortran 4.8.2 no optimizations): *~1.7e-5s* >>>> > > - *Julia* (master, looped): *~1.3e-3s* >>>> > > >>>> > > - *Julia* (master, vectorized): >>>> > > *~1e-3s (!) * >>>> > > >>>> > > I have posted the Julia code and the Fortran reference in this >>>> > > Gist<https://gist.github.com/helgee/8019521>. The computationally >>>> > > expensive part seems to be contained in the *dopcore *or *dopcorevec >>>> > > *function, respectively. What I really do not understand is, why the >>>> > > vectorized expressions seem to run faster or rather what I am doing >>>> wrong >>>> > > here. >>>> > > >>>> > > Any ideas or suggestions? Many thanks in advance! >>>> > > Helge >>>> >>> >>> >>
