You can also get more performance with the @simd and @inbounds macros:
function x2!(x,y)
for i = 1:length(x)
y[i] = x[i]^2
end
return y
end
function x2simd!(x,y)
@simd for i = 1:length(x)
@inbounds y[i] = x[i]^2
end
return y
end
function test(n)
y = zeros(n)
x = linspace(0., 1., n)
println("x2! test")
@timeit x2!(x,y)
println("simd x2! test")
y = zeros(n)
@timeit x2simd!(x,y)
println("dallas test")
y = zeros(n)
@timeit y = x.^2
@timeit y = [xi^2 for xi in x]
end
test(10^4)
==>
x2! test
10000 loops, best of 3: 10.91 µs per loop
simd x2! test
100000 loops, best of 3: 3.30 µs per loop
dallas test
1000 loops, best of 3: 125.95 µs per loop
1000 loops, best of 3: 67.05 µs per loop
On Saturday, March 14, 2015 at 11:10:29 PM UTC-5, Dallas Morisette wrote:
>
> I am very new to Julia. I'm working on adding some features to a fairly
> simple Fortran simulation, and decided to try writing it in Python to make
> it easier to explore variations. After a lot of optimization work I got it
> within about 8x slower than the Fortran code. I had read about Julia and
> had wanted a reason to try it, so I thought I'd see if I could get closer
> to Fortran speeds in Julia. My initial results were depressingly slow
> (140x slower than Fortran and 17x slower than Python) and before trying to
> optimize it I tried some very simple benchmarks to try to understand how to
> get good performance from Julia. One I tried was two different versions of
> squaring each element of a 10,000 element array, one vectorized, and one
> for-loop. I fully expected there to be a large difference in performance of
> Python between the two, but I didn't expect Julia to be slower than Python
> in BOTH cases. I also expected the for loop and vectorized versions to be
> similar, if not the for loop faster given what I'd read
> about devectjorizing Julia code.
>
> I'm sure I'm doing something wrong, but can someone point out what?
>
> Here are the results
>
> # Python Version
> import numpy as np
> n = 10000
> x = np.linspace(0.0,1.0,n)
> y = np.zeros_like(x)
> %timeit y = x**2
> %timeit y = [xi**2 for xi in x]
> 100000 loops, best of 3: 5.42 µs per loop
> 100 loops, best of 3: 3.19 ms per loop
>
>
>
> # Julia Version
> using TimeIt
> n = 10000
> x = linspace(0.0,1.0,n)
> y = zeros(x)
> @timeit y = x.^2
> @timeit y = [xi^2 for xi in x]
> 1000 loops, best of 3: 433.29 µs per loop
> 100 loops, best of 3: 8.57 ms per loop
>
>
> Thanks!
>
> Dallas Morisette
>
>