Always wrap it in a function.
<http://docs.julialang.org/en/release-0.4/manual/performance-tips/>But the
real issue is that they don't evaluate to the same thing. I'd write it as
const N = 100000
function test1()
A = [[1.0 1.0001];[1.0002 1.0003]]
z = A
for i in 1:N
z *= A
end
z
end
function test2()
A = [[1.0 1.0001];[1.0002 1.0003]]
z = A
@parallel for i in 1:N
z *= A
end
z
end
test1() == test2() # Test that the outputs are the same
@time test1()
@time test2()
Notice the test is false. test1() gives a 2x2 matrix of infs, while test2()
returns the same matrix as A. Adding @parallel is changing the computation
because it's using a local variable as Nathan has stated.
On Thursday, July 21, 2016 at 9:45:17 AM UTC-7, Ferran Mazzanti wrote:
>
> Hi Nathan,
>
> I posted the codes, so you can check if they do the same thing or not.
> These went to separate cells in Jupyter, nothing more and nothing less.
> Not even a single line I didn't post. And yes I understand your line of
> reasoning, so that's why I got astonished also.
> But I can see what is making this huge difference, and I'd like to know :)
>
> Best,
>
> Ferran.
>
> On Thursday, July 21, 2016 at 6:31:57 PM UTC+2, Nathan Smith wrote:
>>
>> Hey Ferran,
>>
>> You should be suspicious when your apparent speed up surpasses the level
>> of parallelism available on your CPU. I looks like your codes don't
>> actually compute the same thing.
>>
>> I'm assuming you're trying to compute the matrix exponential of A
>> (A^1000000000) by repeatedly multiplying A. In your parallel code, each
>> process gets a local copy of 'z' and
>> uses that. This means each process is computing something like
>> (A^(1000000000/# of procs)). Check out this
>> <http://docs.julialang.org/en/release-0.4/manual/parallel-computing/#parallel-map-and-loops>
>> section
>> of the documentation on parallel map and loops to see what I mean.
>>
>> That said, that doesn't explain your speed up completely, you should also
>> make sure that each part of your script is wrapped in a function and that
>> you 'warm-up' each function by running it once before comparing.
>>
>> Cheers,
>> Nathan
>>
>> On Thursday, 21 July 2016 12:00:47 UTC-4, Ferran Mazzanti wrote:
>>>
>>> Hi,
>>>
>>> mostly showing my astonishment, but I can even understand the figures in
>>> this stupid parallelization code
>>>
>>> A = [[1.0 1.0001];[1.0002 1.0003]]
>>> z = A
>>> tic()
>>> for i in 1:1000000000
>>> z *= A
>>> end
>>> toc()
>>> A
>>>
>>> produces
>>>
>>> elapsed time: 105.458639263 seconds
>>>
>>> 2x2 Array{Float64,2}:
>>> 1.0 1.0001
>>> 1.0002 1.0003
>>>
>>>
>>>
>>> But then add @parallel in the for loop
>>>
>>> A = [[1.0 1.0001];[1.0002 1.0003]]
>>> z = A
>>> tic()
>>> @parallel for i in 1:1000000000
>>> z *= A
>>> end
>>> toc()
>>> A
>>>
>>> and get
>>>
>>> elapsed time: 0.008912282 seconds
>>>
>>> 2x2 Array{Float64,2}:
>>> 1.0 1.0001
>>> 1.0002 1.0003
>>>
>>>
>>> look at the elapsed time differences! And I'm running this on my Xeon
>>> desktop, not even a cluster
>>> Of course A-B reports
>>>
>>> 2x2 Array{Float64,2}:
>>> 0.0 0.0
>>> 0.0 0.0
>>>
>>>
>>> So is this what one should expect from this kind of simple
>>> paralleizations? If so, I'm definitely *in love* with Julia :):):)
>>>
>>> Best,
>>>
>>> Ferran.
>>>
>>>
>>>