Lex is right that the devectorized form [xi^2 for xi in x] suffers from not
being in a function. However, the x.^2 form is already simply a function
call, so it shouldn’t benefit much from being wrapped in a function. Note
that one of the other performance tips in Julia v0.3 is that you should use
x*x instead of x.^2 (I think this will not be the case in v0.4). On Julia
v0.3.6:

julia> x = rand(10000);

julia> @timeit x.^2

1000 loops, best of 3: 100.46 µs per loop

julia> @timeit x.*x
10000 loops, best of 3: 30.15 µs per loop

I get about the same timing as the second version when writing a custom
function with @simd and @inbounds.

In Python:

In [19]: from numpy.random import rand

In [20]: x = rand(10000);

In [21]: %timeit x**2100000 loops, best of 3: 4.47 µs per loop

So, I’m still seeing a difference of about a factor of 6 between numpy and
Julia here (but found that the difference does depend on array size and is
generally less). I’m curious what the difference is caused by in this case.
Alex’s email shows that a lot of the time is due to allocating the result
array (his x2! and x2simd! functions don’t include the allocation), but I
think the above is a “fair” comparison, as Python allocates an array to
perform x**2 as well.

Kyle
​

On Sat, Mar 14, 2015 at 9:24 PM, <[email protected]> wrote:

> Read
> http://docs.julialang.org/en/latest/manual/performance-tips/?highlight=performance#performance-tips
> in particular the first one avoiding global variables.  I get up to 40
> times the performance by putting your code in a function.
>
> Cheers
> Lex
>
>
> On Sunday, March 15, 2015 at 2:10:29 PM UTC+10, Dallas Morisette wrote:
>>
>> I am very new to Julia. I'm working on adding some features to a fairly
>> simple Fortran simulation, and decided to try writing it in Python to make
>> it easier to explore variations. After a lot of optimization work I got it
>> within about 8x slower than the Fortran code. I had read about Julia and
>> had wanted a reason to try it, so I thought I'd see if I could get closer
>> to Fortran speeds in Julia. My initial results were depressingly slow
>> (140x slower than Fortran and 17x slower than Python) and before trying to
>> optimize it I tried some very simple benchmarks to try to understand how to
>> get good performance from Julia. One I tried was two different versions of
>> squaring each element of a 10,000 element array, one vectorized, and one
>> for-loop. I fully expected there to be a large difference in performance of
>> Python between the two, but I didn't expect Julia to be slower than Python
>> in BOTH cases. I also expected the for loop and vectorized versions to be
>> similar, if not the for loop faster given what I'd read
>> about devectjorizing Julia code.
>>
>> I'm sure I'm doing something wrong, but can someone point out what?
>>
>> Here are the results
>>
>> # Python Version
>> import numpy as np
>> n = 10000
>> x = np.linspace(0.0,1.0,n)
>> y = np.zeros_like(x)
>> %timeit y = x**2
>> %timeit y = [xi**2 for xi in x]
>> 100000 loops, best of 3: 5.42 µs per loop
>> 100 loops, best of 3: 3.19 ms per loop
>>
>>
>>
>> # Julia Version
>> using TimeIt
>> n = 10000
>> x = linspace(0.0,1.0,n)
>> y = zeros(x)
>> @timeit y = x.^2
>> @timeit y = [xi^2 for xi in x]
>> 1000 loops, best of 3: 433.29 µs per loop
>> 100 loops, best of 3: 8.57 ms per loop
>>
>>
>> Thanks!
>>
>> Dallas Morisette
>>
>>

Reply via email to