Re: [julia-users] Strange performance problem with Float32*Bool inside @simd loop

Yichao Yu Wed, 14 Oct 2015 08:10:21 -0700

On Wed, Oct 14, 2015 at 10:57 AM, Damien <[email protected]> wrote:
> Hi all,
>
> I'm noticing a strange performance issue with expressions such as this one:
>
> n = 100000
> a = zeros(Float32, n)
> b = rand(Float32, n)
> c = rand(Float32, n)
>
> function test(a, b, c)
>     @simd for i in 1:length(a)
>         @inbounds a[i] += b[i] * c[i] * (c[i] < b[i]) * (c[i] > b[i]) *
> (c[i] <= b[i]) * (c[i] >= b[i])
>     end
> end
>
>
> The problem depends on the number of statements in the expression and
> whether the comparisons are explicitely cast to Float32.
>
> In Julia 0.4-rc4, I get the following:
>         @inbounds a[i] += b[i] * c[i] * (c[i] < b[i]) * (c[i] > b[i]) *
> (c[i] <= b[i]) * (c[i] >= b[i])
>
>> test(a, b, c)
>> @time test(a, b, c)
>
> 0.000143 seconds (4 allocations: 160 bytes)
>
>
>
>
> @inbounds a[i] += b[i] * (c[i] < b[i]) * (c[i] < b[i]) * (c[i] < b[i])
>
>> test(a, b, c)
>> @time test(a, b, c)
> 0.000004 seconds (4 allocations: 160 bytes)
>
>
> Four or more, loop is NOT vectorised: @inbounds a[i] += b[i] * (c[i] < b[i])
> * (c[i] < b[i]) * (c[i] < b[i]) * (c[i] < b[i])
>
>
>> test(a, b, c)
>> @time test(a, b, c)
> 0.000021 seconds (204 allocations: 3.281 KB)
>
>
> Explicit casts, loop is vectorised again: @inbounds a[i] += b[i] *
> Float32(c[i] < b[i]) * Float32(c[i] < b[i]) * Float32(c[i] < b[i]) *
> Float32(c[i] < b[i])
>
>> test(a, b, c)
>> @time test(a, b, c)
>
> 0.000003 seconds (4 allocations: 160 bytes)
>
>
>
> Julia Version 0.5.0-dev+769
> Commit d9f7c21* (2015-10-14 12:03 UTC)
> Platform Info:
>   System: Darwin (x86_64-apple-darwin13.4.0)
>   CPU: Intel(R) Core(TM) i7-2635QM CPU @ 2.00GHz
>   WORD_SIZE: 64
>   BLAS: libopenblas (DYNAMIC_ARCH NO_AFFINITY Sandybridge)
>   LAPACK: libopenblas
>   LIBM: libopenlibm
>   LLVM: libLLVM-3.3
>


The inlining is a little too fragile and you should check with
@code_llvm if all the functions are inlined.
I've also noticed that the SHA you give doesn't seems to be a valid
commit on JuliaLang/julia so I couldn't check if the inlining fix is
included.

>
>
>

Re: [julia-users] Strange performance problem with Float32*Bool inside @simd loop

Reply via email to