Hi all,
I have found some odd performance scaling when summing and scaling more
than three complex numbers, see the difference between sum5 and sum5b in
this gist: https://gist.github.com/jtravs/11368929
Compare:
julia> using testsums
julia> dosums(Complex{Float64})
elapsed time: 0.022001424 seconds (28800096 bytes allocated)
elapsed time: 0.00194736 seconds (96 bytes allocated)
With:
julia> dosums(Float64)
elapsed time: 0.000664517 seconds (96 bytes allocated)
elapsed time: 0.000782516 seconds (96 bytes allocated)
It seems that splitting the sum into maximum of three operands greatly
speeds up performance for Complex{Float64} whereas it has no significant
effect for Float64. Does anyone know why? I often have to sum and scale 5
or more arrays in my codes and it would be unfortunate to have to hand
block them into sets of three like in sum5b in the gist.