Thanks for your answer! I deleted the post you quoted and re-posted a complete version because I posted it prematurely by accident, sorry about that. The commit in question is d9f7c2125831a16c2386888904f303846a1ced95
Best, Damien On Wednesday, 14 October 2015 17:09:52 UTC+2, Yichao Yu wrote: > > On Wed, Oct 14, 2015 at 10:57 AM, Damien <[email protected] <javascript:>> > wrote: > > Hi all, > > > > I'm noticing a strange performance issue with expressions such as this > one: > > > > n = 100000 > > a = zeros(Float32, n) > > b = rand(Float32, n) > > c = rand(Float32, n) > > > > function test(a, b, c) > > @simd for i in 1:length(a) > > @inbounds a[i] += b[i] * c[i] * (c[i] < b[i]) * (c[i] > b[i]) * > > (c[i] <= b[i]) * (c[i] >= b[i]) > > end > > end > > > > > > The problem depends on the number of statements in the expression and > > whether the comparisons are explicitely cast to Float32. > > > > In Julia 0.4-rc4, I get the following: > > @inbounds a[i] += b[i] * c[i] * (c[i] < b[i]) * (c[i] > b[i]) * > > (c[i] <= b[i]) * (c[i] >= b[i]) > > > >> test(a, b, c) > >> @time test(a, b, c) > > > > 0.000143 seconds (4 allocations: 160 bytes) > > > > > > > > > > @inbounds a[i] += b[i] * (c[i] < b[i]) * (c[i] < b[i]) * (c[i] < b[i]) > > > >> test(a, b, c) > >> @time test(a, b, c) > > 0.000004 seconds (4 allocations: 160 bytes) > > > > > > Four or more, loop is NOT vectorised: @inbounds a[i] += b[i] * (c[i] < > b[i]) > > * (c[i] < b[i]) * (c[i] < b[i]) * (c[i] < b[i]) > > > > > >> test(a, b, c) > >> @time test(a, b, c) > > 0.000021 seconds (204 allocations: 3.281 KB) > > > > > > Explicit casts, loop is vectorised again: @inbounds a[i] += b[i] * > > Float32(c[i] < b[i]) * Float32(c[i] < b[i]) * Float32(c[i] < b[i]) * > > Float32(c[i] < b[i]) > > > >> test(a, b, c) > >> @time test(a, b, c) > > > > 0.000003 seconds (4 allocations: 160 bytes) > > > > > > > > Julia Version 0.5.0-dev+769 > > Commit d9f7c21* (2015-10-14 12:03 UTC) > > Platform Info: > > System: Darwin (x86_64-apple-darwin13.4.0) > > CPU: Intel(R) Core(TM) i7-2635QM CPU @ 2.00GHz > > WORD_SIZE: 64 > > BLAS: libopenblas (DYNAMIC_ARCH NO_AFFINITY Sandybridge) > > LAPACK: libopenblas > > LIBM: libopenlibm > > LLVM: libLLVM-3.3 > > > > The inlining is a little too fragile and you should check with > @code_llvm if all the functions are inlined. > I've also noticed that the SHA you give doesn't seems to be a valid > commit on JuliaLang/julia so I couldn't check if the inlining fix is > included. > > > > > > > >
