That's interesting!
On Sunday, March 15, 2015 at 4:18:34 PM UTC+1, Dallas Morisette wrote: > > Thanks everyone for the suggestions! Here is my updated test: > > using TimeIt > function vec!(x,y) > y = x.*x > end > > function comp!(x,y) > y = [xi*xi for xi in x] > end > > function forloop!(x,y,n) > for i = 1:n > y[i] = x[i]*x[i] > end > end > > function forloop2!(x,y,n) > @simd for i = 1:n > @inbounds y[i] = x[i]*x[i] > end > end > > function test() > n = 10000 > x = linspace(0.0,1.0,n) > y = zeros(x) > @timeit vec!(x,y) > @timeit comp!(x,y) > @timeit forloop!(x,y,n) > @timeit forloop2!(x,y,n) > end > test(); > > 10000 loops, best of 3: 87.82 µs per loop > 1000 loops, best of 3: 62.73 µs per loop > 10000 loops, best of 3: 12.66 µs per loop > 100000 loops, best of 3: 3.54 µs per loop > > > So the SIMD macros combined with a literal for loop give performance > essentially equivalent to a call to numpy. I switched to @time so I could > see the allocations: > > elapsed time: 2.467e-5 seconds (80512 bytes allocated) > elapsed time: 2.1358e-5 seconds (80048 bytes allocated) > elapsed time: 1.5124e-5 seconds (0 bytes allocated) > elapsed time: 6.108e-6 seconds (0 bytes allocated) > > > Looks like one temporary array has to be allocated in both vectorized and > comprehension forms, which reduced the performance by about 5-7X. I suppose > this would depend on the exact calculation being done and the size of the > arrays involved and would have to be tested on a case-by-case basis. > > Thanks for the help - I'm sure I'll be back with more questions! > > Dallas >
