Hi all,

I'm noticing a strange performance issue with expressions such as this one:

n = 100000
a = zeros(Float32, n)
b = rand(Float32, n)
c = rand(Float32, n)

function test(a, b, c)
    @simd for i in 1:length(a)
        @inbounds a[i] += b[i] * c[i] * (c[i] < b[i]) * (c[i] > b[i]) * 
(c[i] <= b[i]) * (c[i] >= b[i])
    end
end


The problem depends on the number of statements in the expression and 
whether the comparisons are explicitely cast to Float32.

In Julia 0.4-rc4, I get the following:
        @inbounds a[i] += b[i] * c[i] * (c[i] < b[i]) * (c[i] > b[i]) * 
(c[i] <= b[i]) * (c[i] >= b[i])

> test(a, b, c)
> @time test(a, b, c)

0.000143 seconds (4 allocations: 160 bytes)




@inbounds a[i] += b[i] * (c[i] < b[i]) * (c[i] < b[i]) * (c[i] < b[i])

> test(a, b, c)
> @time test(a, b, c)
0.000004 seconds (4 allocations: 160 bytes)


Four or more, loop is NOT vectorised: @inbounds a[i] += b[i] * (c[i] < 
b[i]) * (c[i] < b[i]) * (c[i] < b[i]) * (c[i] < b[i])
 

> test(a, b, c)
> @time test(a, b, c)
0.000021 seconds (204 allocations: 3.281 KB)


Explicit casts, loop is vectorised again: @inbounds a[i] += b[i] * 
Float32(c[i] < b[i]) * Float32(c[i] < b[i]) * Float32(c[i] < b[i]) * 
Float32(c[i] < b[i])

> test(a, b, c)
> @time test(a, b, c)

0.000003 seconds (4 allocations: 160 bytes)



Julia Version 0.5.0-dev+769
Commit d9f7c21* (2015-10-14 12:03 UTC)
Platform Info:
  System: Darwin (x86_64-apple-darwin13.4.0)
  CPU: Intel(R) Core(TM) i7-2635QM CPU @ 2.00GHz
  WORD_SIZE: 64
  BLAS: libopenblas (DYNAMIC_ARCH NO_AFFINITY Sandybridge)
  LAPACK: libopenblas
  LIBM: libopenlibm
  LLVM: libLLVM-3.3




Reply via email to