Another point about coding style. You use the `@inbounds` macro without explicitly checking bounds first. That might be acceptable for benchmarking and testing, but when you share your code online I think it is important to be aware of the dangers if this kind of pattern is used in production code. Memory corruption bugs are impossible to find for most Julia users, and will likely give julia a bad reputation for being unstable.
Ivar kl. 18:05:51 UTC+2 mandag 28. april 2014 skrev Kevin Squire følgende: > > Ok, gotcha. > > On Monday, April 28, 2014, John Travers <[email protected] <javascript:>> > wrote: > >> No, I edited the output for clarity. The slow down is consistent >> regardless of order and amount of warmup. Ivar's fix of grouping into >> threes using parenthesis eliminates the problem. >> >> On Monday, April 28, 2014 3:51:00 PM UTC+2, Kevin Squire wrote: >>> >>> Please correct me if I'm wrong, but it looks like your first set of >>> timings include compilation time, since the amount of memory allocated is >>> so high and you run right after using the file. Perhaps you can run it >>> again with warmup? >>> >>> Kevin >>> >>> On Monday, April 28, 2014, John Travers <[email protected]> wrote: >>> >>>> You just beat me to it! Thanks! >>>> >>>> On Monday, April 28, 2014 3:41:36 PM UTC+2, Ivar Nesje wrote: >>>>> >>>>> Reported issue: https://github.com/JuliaLang/julia/issues/6681 >>>>> >>>>> kl. 13:56:29 UTC+2 mandag 28. april 2014 skrev Ivar Nesje følgende: >>>>>> >>>>>> It seems like Jeff was wrong in his statement in >>>>>> 32384010f<https://github.com/JuliaLang/julia/commit/32384010fd689e0b6a77ee93b24613fb0bdb008f> >>>>>> . >>>>>> >>>>>> This discussion belongs in an issue on github. Do you want to post it >>>>>> there? >>>>>> >>>>>> You can also fix the problem a little prettier by adding a () around >>>>>> 3 of the numbers. >>>>>> >>>>>> Ivar >>>>>> >>>>>> kl. 13:38:30 UTC+2 mandag 28. april 2014 skrev John Travers følgende: >>>>>>> >>>>>>> Hi all, >>>>>>> >>>>>>> I have found some odd performance scaling when summing and scaling >>>>>>> more than three complex numbers, see the difference between sum5 and >>>>>>> sum5b >>>>>>> in this gist: https://gist.github.com/jtravs/11368929 >>>>>>> >>>>>>> Compare: >>>>>>> >>>>>>> julia> using testsums >>>>>>> julia> dosums(Complex{Float64}) >>>>>>> elapsed time: 0.022001424 seconds (28800096 bytes allocated) >>>>>>> elapsed time: 0.00194736 seconds (96 bytes allocated) >>>>>>> >>>>>>> With: >>>>>>> >>>>>>> julia> dosums(Float64) >>>>>>> elapsed time: 0.000664517 seconds (96 bytes allocated) >>>>>>> elapsed time: 0.000782516 seconds (96 bytes allocated) >>>>>>> >>>>>>> It seems that splitting the sum into maximum of three operands >>>>>>> greatly speeds up performance for Complex{Float64} whereas it has no >>>>>>> significant effect for Float64. Does anyone know why? I often have to >>>>>>> sum >>>>>>> and scale 5 or more arrays in my codes and it would be unfortunate to >>>>>>> have >>>>>>> to hand block them into sets of three like in sum5b in the gist. >>>>>>> >>>>>>>
