The magic of @inbounds and @simd :)

Thanks, Kristoffer!

Charles

On Wednesday, December 30, 2015, Kristoffer Carlsson <kcarlsso...@gmail.com>
wrote:

> If you want to get an even faster version you could do something like:
>
> function calcSum_simd{T}(x::Vector{T}, y::Vector{T}, Ei::T, Ef::T)
>     mysum = zero(T)
>     @inbounds @simd for i in eachindex(x, y)
>          mysum += ifelse(Ei < x[i] <= Ef, y[i], zero(T))
>
>     end
>     return mysum
> end
>
> which would use SIMD instructions.
>
> Timing difference:
>
> N = 10000000
> y = rand(N);
> x = rand(N)
> Ei = 0.2;
> Ef = 0.7;
>
> julia> @time calcSum_simd(x,y,Ei, Ef);
>   0.021155 seconds (5 allocations: 176 bytes)
>
>
> julia> @time calcSum(x,y,Ei, Ef)
>   0.069911 seconds (5 allocations: 176 bytes)
>
>
> Regarding map being slow. That is worked on here
> https://github.com/JuliaLang/julia/pull/13412
>
>
> On Wednesday, December 30, 2015 at 3:05:47 AM UTC+1, Charles Santana wrote:
>>
>> Sorry, there was a typo in the function calcSum2. Please consider the
>> following code:
>>
>> function calcSum2(x::Array{Float64,1}, y::Array{Float64,1}, Ei::Float64,
>> Ef::Float64, N::Int64)
>>
>>         return sum(y[map(v -> Ei < v <= Ef, x)]);
>> end
>>
>>
>> And so the results of the calls for this function change a bit (but not
>> the performance):
>>
>>         @time calcSum2(x,y,Ei,Ef,N)
>>           0.000110 seconds (1.01 k allocations: 20.969 KB)
>>         246.1975746121703
>>
>>         @time calcSum2(x,y,Ei,Ef,N)
>>           0.000079 seconds (1.01 k allocations: 20.969 KB)
>>         246.1975746121703
>>
>>         @time calcSum2(x,y,Ei,Ef,N)
>>           0.000051 seconds (1.01 k allocations: 20.969 KB)
>>         246.1975746121703
>>
>>
>> Thanks again, sorry for this inconvenience!
>>
>> Charles
>>
>> On 30 December 2015 at 03:00, Charles Novaes de Santana <
>> charles...@gmail.com> wrote:
>>
>>> Dear all,
>>>
>>> In a project I am developing a @profile shows me that the slowest part
>>> of the code is the sum of elements of an Array that follow some conditions.
>>>
>>> Please consider the following code:
>>>
>>>         y = rand(1000);
>>>         x = collect(0.0:0.001:0.999);
>>>         Ei = 0.2;
>>>         Ef = 0.7;
>>>         N = length(x)
>>>
>>> I want to calculate the sum of elements in "y" for which elements the
>>> respective values in "x" are between "Ei" and "Ef". If I was using R, for
>>> example, I would use something like:
>>>
>>> mysum = sum(y[which((x < Ef)&&(x > Ei))]); #(not tested in R, but I
>>> suppose that is the way to do it)
>>>
>>> In Julia, I can think in at least two ways to calculate it:
>>>
>>> function calcSum(x::Array{Float64,1}, y::Array{Float64,1}, Ei::Float64,
>>> Ef::Float64, N::Int64)
>>>         mysum=0.0::Float64;
>>>         for(i in 1:N)
>>>              if( Ei < x[i] <= Ef)
>>>                  mysum += y[i];
>>>              end
>>>         end
>>>         return(mysum);
>>> end
>>>
>>> function calcSum2(x::Array{Float64,1}, y::Array{Float64,1}, Ei::Float64,
>>> Ef::Float64, N::Int64)
>>>         return sum(y[map(v -> Ei < v < Ef, x)]);
>>> end
>>>
>>> As you can see below, for the first function (calcSum) I got a much
>>> better performance than for the second one (minimum 10x faster).
>>>
>>>
>>>          @time calcSum(x,y,Ei,Ef,N)
>>>           0.003986 seconds (2.56 k allocations: 125.168 KB)
>>>         246.19757461217014
>>>
>>>         @time calcSum(x,y,Ei,Ef,N)
>>>           0.000003 seconds (5 allocations: 176 bytes)
>>>         246.19757461217014
>>>
>>>         @time calcSum(x,y,Ei,Ef,N)
>>>           0.000002 seconds (5 allocations: 176 bytes)
>>>         246.19757461217014
>>>
>>>         @time calcSum2(x,y,Ei,Ef,N)
>>>           0.003762 seconds (1.61 k allocations: 53.743 KB)
>>>         245.48156534879303
>>>
>>>         @time calcSum2(x,y,Ei,Ef,N)
>>>           0.000050 seconds (1.01 k allocations: 20.969 KB)
>>>         245.48156534879303
>>>
>>>         @time calcSum2(x,y,Ei,Ef,N)
>>>           0.000183 seconds (1.01 k allocations: 20.969 KB)
>>>         245.48156534879303
>>>
>>> Does any one have an idea about how to improve the performance here?
>>>
>>> Many thanks for any help! Happy new year to all of you!
>>>
>>> Charles
>>>
>>>
>>>
>>>
>>> --
>>> Um axé! :)
>>>
>>> --
>>> Charles Novaes de Santana, PhD
>>> http://www.imedea.uib-csic.es/~charles
>>>
>>
>>
>>
>> --
>> Um axé! :)
>>
>> --
>> Charles Novaes de Santana, PhD
>> http://www.imedea.uib-csic.es/~charles
>>
>

-- 
Um axé! :)

--
Charles Novaes de Santana, PhD
http://www.imedea.uib-csic.es/~charles

Reply via email to