That's interesting!

On Sunday, March 15, 2015 at 4:18:34 PM UTC+1, Dallas Morisette wrote:
>
> Thanks everyone for the suggestions! Here is my updated test:
>
> using TimeIt
> function vec!(x,y)
>     y = x.*x
> end
>
> function comp!(x,y)
>     y = [xi*xi for xi in x]
> end
>
> function forloop!(x,y,n)
>     for i = 1:n 
>         y[i] = x[i]*x[i]
>     end
> end
>
> function forloop2!(x,y,n)
>     @simd for i = 1:n 
>         @inbounds y[i] = x[i]*x[i] 
>     end
> end
>     
> function test()
>     n = 10000
>     x = linspace(0.0,1.0,n)
>     y = zeros(x)
>     @timeit vec!(x,y)
>     @timeit comp!(x,y)
>     @timeit forloop!(x,y,n)
>     @timeit forloop2!(x,y,n)
> end
> test();
>
> 10000 loops, best of 3: 87.82 µs per loop
> 1000 loops, best of 3: 62.73 µs per loop
> 10000 loops, best of 3: 12.66 µs per loop
> 100000 loops, best of 3: 3.54 µs per loop
>
>
> So the SIMD macros combined with a literal for loop give performance 
> essentially equivalent to a call to numpy. I switched to @time so I could 
> see the allocations:
>
> elapsed time: 2.467e-5 seconds (80512 bytes allocated)
> elapsed time: 2.1358e-5 seconds (80048 bytes allocated)
> elapsed time: 1.5124e-5 seconds (0 bytes allocated)
> elapsed time: 6.108e-6 seconds (0 bytes allocated)
>
>
> Looks like one temporary array has to be allocated in both vectorized and 
> comprehension forms, which reduced the performance by about 5-7X. I suppose 
> this would depend on the exact calculation being done and the size of the 
> arrays involved and would have to be tested on a case-by-case basis. 
>
> Thanks for the help - I'm sure I'll be back with more questions!
>
> Dallas
>

Reply via email to