one typo from my functions. The serial version should be:
function serial_example()
A = [[1.0 1.001];[1.002 1.003]]
z = *eye(2)*
for i in 1:1000000000
z *= A
end
return z
endto be consistent. With 4 processors, I see a roughly 2x speed up for the parallel version and calculations are consistent. On Thursday, 21 July 2016 13:02:52 UTC-4, Nathan Smith wrote: > > in Jupyer notebook, add processors with addprocs(N) > > On Thursday, 21 July 2016 12:59:02 UTC-4, Nathan Smith wrote: >> >> To be clear, you need to compare the final 'z' not the final 'A' to check >> if your calculations are consistent. The matrix A does not change through >> out this calculation, but the matrix z does. >> Also, there is no parallelism with the @parallel loop unless your start >> julia with 'julia -np N' where N is the number of processes you'd like to >> use. >> >> On Thursday, 21 July 2016 12:45:17 UTC-4, Ferran Mazzanti wrote: >>> >>> Hi Nathan, >>> >>> I posted the codes, so you can check if they do the same thing or not. >>> These went to separate cells in Jupyter, nothing more and nothing less. >>> Not even a single line I didn't post. And yes I understand your line of >>> reasoning, so that's why I got astonished also. >>> But I can see what is making this huge difference, and I'd like to know >>> :) >>> >>> Best, >>> >>> Ferran. >>> >>> On Thursday, July 21, 2016 at 6:31:57 PM UTC+2, Nathan Smith wrote: >>>> >>>> Hey Ferran, >>>> >>>> You should be suspicious when your apparent speed up surpasses the >>>> level of parallelism available on your CPU. I looks like your codes don't >>>> actually compute the same thing. >>>> >>>> I'm assuming you're trying to compute the matrix exponential of A >>>> (A^1000000000) by repeatedly multiplying A. In your parallel code, each >>>> process gets a local copy of 'z' and >>>> uses that. This means each process is computing something like >>>> (A^(1000000000/# of procs)). Check out this >>>> <http://docs.julialang.org/en/release-0.4/manual/parallel-computing/#parallel-map-and-loops> >>>> section >>>> of the documentation on parallel map and loops to see what I mean. >>>> >>>> That said, that doesn't explain your speed up completely, you should >>>> also make sure that each part of your script is wrapped in a function and >>>> that you 'warm-up' each function by running it once before comparing. >>>> >>>> Cheers, >>>> Nathan >>>> >>>> On Thursday, 21 July 2016 12:00:47 UTC-4, Ferran Mazzanti wrote: >>>>> >>>>> Hi, >>>>> >>>>> mostly showing my astonishment, but I can even understand the figures >>>>> in this stupid parallelization code >>>>> >>>>> A = [[1.0 1.0001];[1.0002 1.0003]] >>>>> z = A >>>>> tic() >>>>> for i in 1:1000000000 >>>>> z *= A >>>>> end >>>>> toc() >>>>> A >>>>> >>>>> produces >>>>> >>>>> elapsed time: 105.458639263 seconds >>>>> >>>>> 2x2 Array{Float64,2}: >>>>> 1.0 1.0001 >>>>> 1.0002 1.0003 >>>>> >>>>> >>>>> >>>>> But then add @parallel in the for loop >>>>> >>>>> A = [[1.0 1.0001];[1.0002 1.0003]] >>>>> z = A >>>>> tic() >>>>> @parallel for i in 1:1000000000 >>>>> z *= A >>>>> end >>>>> toc() >>>>> A >>>>> >>>>> and get >>>>> >>>>> elapsed time: 0.008912282 seconds >>>>> >>>>> 2x2 Array{Float64,2}: >>>>> 1.0 1.0001 >>>>> 1.0002 1.0003 >>>>> >>>>> >>>>> look at the elapsed time differences! And I'm running this on my Xeon >>>>> desktop, not even a cluster >>>>> Of course A-B reports >>>>> >>>>> 2x2 Array{Float64,2}: >>>>> 0.0 0.0 >>>>> 0.0 0.0 >>>>> >>>>> >>>>> So is this what one should expect from this kind of simple >>>>> paralleizations? If so, I'm definitely *in love* with Julia :):):) >>>>> >>>>> Best, >>>>> >>>>> Ferran. >>>>> >>>>> >>>>>
Untitled.ipynb
Description: Binary data
