A great tool to figuring out what is going on in these cases is 
`@code_llvm`. It shows you a representation of your code that is still 
readable, but very close to the machine.

Your simple julia code without a `@simd` is nearly optimal, but does 
benefits from the inclusion of `@inbounds`


   - While_loop with `@inbounds`:  minimum time:     797.28 μs
   - While loop without `@inbounds`: minimum time:    1.01 ms
   - For loop without & with `@inbounds`: minimum time: 802/812.11 μs
   
function simple(A, b, stride, N)
  N = min(N, length(A))
  for i in 1:stride:N
    @inbounds A[i] *= b
  end 
end

function while_based(A, b, stride, N)
  i = 1 
  N = min(N, length(A))
  while i <= N
    A[i] *= b
    i += stride
  end 
end


Now to the question whether or not `@simd` is beneficial in this case. LLVM 
has a loop vectorizer that we run and it has a cost-benefits (and 
correctness) analysis when it sees a loop. The fact that in the code_llvm 
we don't see vectorized code means that LLVM did not deem it worth while to 
vectorize our code (as Kristoffer said most likely because of non unit 
strides). With `@simd` we (forcibly) tell LLVM to vectorize out code and to 
be less strict about correctness and to also not to do a cost-benefit 
analysis. While vectorized code has great performance benefits it also 
comes with costs (code size increase, overhead).

I hope this tough analysis helps.


On Friday, 29 July 2016 22:36:50 UTC+9, Kristoffer Carlsson wrote:
>
> It is likely because the ranges are not UnitRanges.
>
> On Friday, July 29, 2016 at 5:35:57 AM UTC-4, Andreas Lobinger wrote:
>>
>> Hello colleague,
>>
>> On Friday, July 29, 2016 at 8:59:36 AM UTC+2, Juan Lopez wrote:
>>>
>>> Hello,
>>>
>>> I have a function which is doing basically an operation inside a loop 
>>> and when adding @simd or @inbounds time doesn't improve, in any case it 
>>> seems slightly worse.
>>>
>>  
>>
>>> Is there an explanation for this? Thank you
>>>
>>
>> there is a non-vanishing propability, that the plain loop is already 
>> compiled to the optimal code. Maybe you try to look at the lowered code.
>>
>>  
>>
>

Reply via email to