Hi all,

I'm noticing a strange performance issue with expressions such as this one:

n = 100000
a = zeros(Float32, n)
b = rand(Float32, n)
c = rand(Float32, n)

function test(a, b, c)
   @simd for i in 1:length(a)
       @inbounds a[i] += b[i] * c[i] * (c[i] < b[i]) * (c[i] > b[i]) * 
(c[i] <= b[i]) * (c[i] >= b[i])
   end
end

The problem is that performance and successful vectorisation depend on the 
number of comparison statements in the expression and whether the 
comparisons are explicitely cast to Float32.

In Julia 0.4-rc4, I get the following:

@inbounds a[i] += b[i] * c[i] * (c[i] < b[i]) * (c[i] > b[i]) * (c[i] <= 
b[i])
> test(a, b, c)
> @time test(a, b, c) 
0.000169 seconds (4 allocations: 160 bytes)

@inbounds a[i] += b[i] * c[i] * (c[i] < b[i]) * (c[i] > b[i]) * (c[i] <= 
b[i]) * (c[i] >= b[i])
> test(a, b, c)
> @time test(a, b, c)
0.007258 seconds (200.00 k allocations: 3.052 MB, 47.59% gc time)

@inbounds a[i] += b[i] * c[i] * Float32(c[i] < b[i]) * Float32(c[i] > b[i]) 
* Float32(c[i] <= b[i]) * Float32(c[i] <= b[i])
> test(a, b, c)
> @time test(a, b, c)
0.000137 seconds (4 allocations: 160 bytes)

I get a similar behavior in the current 0.5 HEAD (Commit d9f7c21* with the 
fix for issue #13553) but the threshold for the number of comparisons is 
slightly different.

(a) Is meant to be OK to use expressions like a[i] * (c[i] < b[i]) or 
should I always cast explicitely? I really like the implicit version, 
because it is very readable and a natural translation of equations 
involving cases.

(b) What is causing the vectorisation threshold observed here?

Best,
Damien

Reply via email to