one typo from my functions. The serial version should be:

function serial_example()
    A = [[1.0 1.001];[1.002 1.003]]
    z = *eye(2)*
    for i in 1:1000000000
        z *= A
    end
    return z
end

to be consistent. With 4 processors, I see a roughly 2x speed up for the 
parallel version and calculations are consistent.



On Thursday, 21 July 2016 13:02:52 UTC-4, Nathan Smith wrote:
>
> in Jupyer notebook, add processors with addprocs(N) 
>
> On Thursday, 21 July 2016 12:59:02 UTC-4, Nathan Smith wrote:
>>
>> To be clear, you need to compare the final 'z' not the final 'A' to check 
>> if your calculations are consistent. The matrix A does not change through 
>> out this calculation, but the matrix z does.
>> Also, there is no parallelism with the @parallel loop unless your start 
>> julia with 'julia -np N' where N is the number of processes you'd like to 
>> use.
>>
>> On Thursday, 21 July 2016 12:45:17 UTC-4, Ferran Mazzanti wrote:
>>>
>>> Hi Nathan,
>>>
>>> I posted the codes, so you can check if they do the same thing or not. 
>>> These went to separate cells in Jupyter, nothing more and nothing less.
>>> Not even a single line I didn't post. And yes I understand your line of 
>>> reasoning, so that's why I got astonished also.
>>> But I can see what is making this huge difference, and I'd like to know 
>>> :)
>>>
>>> Best,
>>>
>>> Ferran.
>>>
>>> On Thursday, July 21, 2016 at 6:31:57 PM UTC+2, Nathan Smith wrote:
>>>>
>>>> Hey Ferran, 
>>>>
>>>> You should be suspicious when your apparent speed up surpasses the 
>>>> level of parallelism available on your CPU. I looks like your codes don't 
>>>> actually compute the same thing.
>>>>
>>>> I'm assuming you're trying to compute the matrix exponential of A 
>>>> (A^1000000000) by repeatedly multiplying A. In your parallel code, each 
>>>> process gets a local copy of 'z' and
>>>> uses that. This means each process is computing something like 
>>>> (A^(1000000000/# of procs)). Check out this 
>>>> <http://docs.julialang.org/en/release-0.4/manual/parallel-computing/#parallel-map-and-loops>
>>>>  section 
>>>> of the documentation on parallel map and loops to see what I mean.
>>>>
>>>> That said, that doesn't explain your speed up completely, you should 
>>>> also make sure that each part of your script is wrapped in a function and 
>>>> that you 'warm-up' each function by running it once before comparing.
>>>>
>>>> Cheers, 
>>>> Nathan
>>>>
>>>> On Thursday, 21 July 2016 12:00:47 UTC-4, Ferran Mazzanti wrote:
>>>>>
>>>>> Hi,
>>>>>
>>>>> mostly showing my astonishment, but I can even understand the figures 
>>>>> in this stupid parallelization code
>>>>>
>>>>> A = [[1.0 1.0001];[1.0002 1.0003]]
>>>>> z = A
>>>>> tic()
>>>>> for i in 1:1000000000
>>>>>     z *= A
>>>>> end
>>>>> toc()
>>>>> A
>>>>>
>>>>> produces
>>>>>
>>>>> elapsed time: 105.458639263 seconds
>>>>>
>>>>> 2x2 Array{Float64,2}:
>>>>>  1.0     1.0001
>>>>>  1.0002  1.0003
>>>>>
>>>>>
>>>>>
>>>>> But then add @parallel in the for loop
>>>>>
>>>>> A = [[1.0 1.0001];[1.0002 1.0003]]
>>>>> z = A
>>>>> tic()
>>>>> @parallel for i in 1:1000000000
>>>>>     z *= A
>>>>> end
>>>>> toc()
>>>>> A
>>>>>
>>>>> and get 
>>>>>
>>>>> elapsed time: 0.008912282 seconds
>>>>>
>>>>> 2x2 Array{Float64,2}:
>>>>>  1.0     1.0001
>>>>>  1.0002  1.0003
>>>>>
>>>>>
>>>>> look at the elapsed time differences! And I'm running this on my Xeon 
>>>>> desktop, not even a cluster
>>>>> Of course A-B reports
>>>>>
>>>>> 2x2 Array{Float64,2}:
>>>>>  0.0  0.0
>>>>>  0.0  0.0
>>>>>
>>>>>
>>>>> So is this what one should expect from this kind of simple 
>>>>> paralleizations? If so, I'm definitely *in love* with Julia :):):)
>>>>>
>>>>> Best,
>>>>>
>>>>> Ferran.
>>>>>
>>>>>
>>>>>

Attachment: Untitled.ipynb
Description: Binary data

Reply via email to