in Jupyer notebook, add processors with addprocs(N)
On Thursday, 21 July 2016 12:59:02 UTC-4, Nathan Smith wrote: > > To be clear, you need to compare the final 'z' not the final 'A' to check > if your calculations are consistent. The matrix A does not change through > out this calculation, but the matrix z does. > Also, there is no parallelism with the @parallel loop unless your start > julia with 'julia -np N' where N is the number of processes you'd like to > use. > > On Thursday, 21 July 2016 12:45:17 UTC-4, Ferran Mazzanti wrote: >> >> Hi Nathan, >> >> I posted the codes, so you can check if they do the same thing or not. >> These went to separate cells in Jupyter, nothing more and nothing less. >> Not even a single line I didn't post. And yes I understand your line of >> reasoning, so that's why I got astonished also. >> But I can see what is making this huge difference, and I'd like to know :) >> >> Best, >> >> Ferran. >> >> On Thursday, July 21, 2016 at 6:31:57 PM UTC+2, Nathan Smith wrote: >>> >>> Hey Ferran, >>> >>> You should be suspicious when your apparent speed up surpasses the level >>> of parallelism available on your CPU. I looks like your codes don't >>> actually compute the same thing. >>> >>> I'm assuming you're trying to compute the matrix exponential of A >>> (A^1000000000) by repeatedly multiplying A. In your parallel code, each >>> process gets a local copy of 'z' and >>> uses that. This means each process is computing something like >>> (A^(1000000000/# of procs)). Check out this >>> <http://docs.julialang.org/en/release-0.4/manual/parallel-computing/#parallel-map-and-loops> >>> section >>> of the documentation on parallel map and loops to see what I mean. >>> >>> That said, that doesn't explain your speed up completely, you should >>> also make sure that each part of your script is wrapped in a function and >>> that you 'warm-up' each function by running it once before comparing. >>> >>> Cheers, >>> Nathan >>> >>> On Thursday, 21 July 2016 12:00:47 UTC-4, Ferran Mazzanti wrote: >>>> >>>> Hi, >>>> >>>> mostly showing my astonishment, but I can even understand the figures >>>> in this stupid parallelization code >>>> >>>> A = [[1.0 1.0001];[1.0002 1.0003]] >>>> z = A >>>> tic() >>>> for i in 1:1000000000 >>>> z *= A >>>> end >>>> toc() >>>> A >>>> >>>> produces >>>> >>>> elapsed time: 105.458639263 seconds >>>> >>>> 2x2 Array{Float64,2}: >>>> 1.0 1.0001 >>>> 1.0002 1.0003 >>>> >>>> >>>> >>>> But then add @parallel in the for loop >>>> >>>> A = [[1.0 1.0001];[1.0002 1.0003]] >>>> z = A >>>> tic() >>>> @parallel for i in 1:1000000000 >>>> z *= A >>>> end >>>> toc() >>>> A >>>> >>>> and get >>>> >>>> elapsed time: 0.008912282 seconds >>>> >>>> 2x2 Array{Float64,2}: >>>> 1.0 1.0001 >>>> 1.0002 1.0003 >>>> >>>> >>>> look at the elapsed time differences! And I'm running this on my Xeon >>>> desktop, not even a cluster >>>> Of course A-B reports >>>> >>>> 2x2 Array{Float64,2}: >>>> 0.0 0.0 >>>> 0.0 0.0 >>>> >>>> >>>> So is this what one should expect from this kind of simple >>>> paralleizations? If so, I'm definitely *in love* with Julia :):):) >>>> >>>> Best, >>>> >>>> Ferran. >>>> >>>> >>>>
