Hello,
By writing a code for unit normal scaling I've found a big differences
related to where a function used in broadcast is defined, *globally* vs*
locally*. Consider functions below:
function scun2!(A)
shift = mean( A, 1)
stretch = std(A, 1)
f(a, b, c) = (a - b) / c # defined locally
broadcast!(f, A, A, shift, stretch)
shift, stretch
end
f_scun(a, b, c) = (a - b) / c # defined globally
function scun3!(A)
shift = mean( A)
stretch = std(A, 1)
broadcast!(f_scun, A, A, shift, stretch)
shift, stretch
end
Resulting performance is:
R2 = copy(T)
@time sh2, sc2 = scun2!(R2);
0.035527 seconds (19.51 k allocations: 967.273 KB)
R3 = copy(T)
@time sh3, sc3 = scun3!(R3);
0.009705 seconds (54 allocations: 17.547 KB)
How can be explained, that if f_scun is defined outside the function the
performance is 3.6 times better (number of allocations is also large)? I'm
using Julia 0.4.3
Thank you,
Igor