But if branch prediction doesn't factor in, what is the explanation of this:

*julia> *a *=* *rand*(5000);

*julia> *b *=* *rand*(5000);

*julia> *c *=* *rand*(5000) *+* 0.5;
*julia> *d *=* *rand*(5000) *+* 1;

*julia> **@time* *essai*(200,a,b);

 14.607105 seconds (5 allocations: 1.922 KB)


*julia> **@time* *essai*(200,a,c);

  8.357925 seconds (5 allocations: 1.922 KB)

*julia> **@time* *essai*(200,a,d);

  3.159876 seconds (5 allocations: 1.922 KB)


On Friday, September 9, 2016 at 12:53:46 AM UTC+2, Yichao Yu wrote:
>
> Shape is irrelevant since it doesn't affect the order in the loop at all.
>
> Branch prediction is not the issue here.
>
> The issue is optimizing memory access and simd.
>
> It is illegal to optimize the original code into `a[k] += ss1 > ss2`. It 
> is legal to optimize the `if ss1 > ss2 ak += 1 end` version to `ak += ss1 > 
> ss2` and this is the optimization LLVM should do but doesn't in this case.
>
> Also, the thing to look for to check if there's vectorization in llvm ir 
> is the vector type in the loop body like
>
> ```
>   %index = phi i64 [ 0, %vector.ph ], [ %index.next, %vector.body ]
>   %offset.idx = or i64 %index, 1
>   %20 = add i64 %offset.idx, -1
>   %21 = getelementptr i64, i64* %19, i64 %20
>   %22 = bitcast i64* %21 to <4 x i64>*
>   store <4 x i64> zeroinitializer, <4 x i64>* %22, align 8
>   %23 = getelementptr i64, i64* %21, i64 4
>   %24 = bitcast i64* %23 to <4 x i64>*
>   store <4 x i64> zeroinitializer, <4 x i64>* %24, align 8
>   %25 = getelementptr i64, i64* %21, i64 8
>   %26 = bitcast i64* %25 to <4 x i64>*
>   store <4 x i64> zeroinitializer, <4 x i64>* %26, align 8
>   %27 = getelementptr i64, i64* %21, i64 12
>   %28 = bitcast i64* %27 to <4 x i64>*
>   store <4 x i64> zeroinitializer, <4 x i64>* %28, align 8
>   %index.next = add i64 %index, 16
>   %29 = icmp eq i64 %index.next, %n.vec
> ```
>
> having a BB named `vector.body` doesn't mean the loop is vectorized.
>
>
>
> On Thu, Sep 8, 2016 at 6:40 PM, 'Greg Plowman' via julia-users <
> [email protected] <javascript:>> wrote:
>
>> The difference is probably simd.
>>
>> the branch will code will not use simd.
>>
>> Either of these should eliminate branch and allow simd. 
>> ak += ss1>ss2
>> ak += ifelse(ss1>ss2, 1, 0)
>>
>> Check with @code_llvm, look for section vector.body
>>
>>
>>  at 5:45:30 AM UTC+10, Dupont wrote:
>>
>>> What is strange to me is that this is much slower
>>>
>>>
>>> function essai(n, s1, s2)
>>>     a = Vector{Int64}(n)
>>>
>>>     @inbounds for k = 1:n
>>>         ak = 0
>>>         for ss1 in s1, ss2 in s2
>>>             if ss1 > ss2
>>>             ak += 1
>>>             end
>>>         end
>>>         a[k] = ak
>>>     end
>>> end
>>>
>>>
>>>
>

Reply via email to