Hi Andreas: this is very helpful, thanks! On Wed, Jan 7, 2015 at 6:06 AM, Andreas Noack <[email protected]> wrote:
> I made a plot that compares the time it takes to move an array with the > one-sided methods we are using now and with MPI.jl. It is here > > https://github.com/JuliaLang/julia/issues/9167#issuecomment-64721543 > > 2015-01-07 1:54 GMT-05:00 Amuthan <[email protected]>: > > Amit: Thanks for the suggestion. I gave it a quick try, but wasn't >> successful. It appears to me that communication between the processors (to >> obtain the boundary data) would require reconstructing the DArray from the >> localparts at the end of each iteration. I guess I'll have to take a deeper >> look into the implementation of DArrays to understand how best to implement >> this. >> >> In the meantime, I got a reasonable speedup using the Julia wrapper for >> MPI (https://github.com/JuliaParallel/MPI.jl). Has anyone tried >> comparing the performance of the one-sided message passing model of DArray >> and the standard (2-sided) MPI model? >> >> Amuthan >> >> On Mon, Jan 5, 2015 at 12:53 AM, Amit Murthy <[email protected]> >> wrote: >> >>> You can have only two DArrays and use localpart() to get the local parts >>> of the arrays on each worker and work off that. >>> >>> With a single iteration the network overhead will be much more than any >>> gains from distributed computation - it depends on the computation of >>> course. >>> >>> Currently, DArrays work best if the distributed computation can work >>> solely off localparts. An efficient means of setindex! on darrays is a TODO >>> at this time. >>> >>> On Mon, Jan 5, 2015 at 12:34 PM, Amuthan <[email protected]> wrote: >>> >>>> Hi Amit: yes, the idea is to have just two DArrays, one each for the >>>> previous and current iterations. I had some trouble assigning values >>>> directly to a DArray (a setindex! error) and so had to write it like this. >>>> Do you know any means around this? >>>> >>>> Btw, the parallel code runs slower than the serial version even for >>>> just one iteration. >>>> >>>> On Sun, Jan 4, 2015 at 10:27 PM, Amit Murthy <[email protected]> >>>> wrote: >>>> >>>>> As written, this is creating a 1000 DArrays. I think you intended to >>>>> have only 2 of them and swap values in each iteration? >>>>> >>>>> >>>>> On Sunday, 4 January 2015 11:07:47 UTC+5:30, Amuthan A. Ramabathiran >>>>> wrote: >>>>>> >>>>>> Hello: I recently started exploring the parallel capabilities of >>>>>> Julia and I need some help in understanding and improving the >>>>>> performance a >>>>>> very elementary parallel code using DArrays (I use Julia >>>>>> version 0.4.0-dev+2431). The code pasted below (based essentially on >>>>>> plife.jl) solves u''(x) = 0, x \in [0,1] with u(0) and u(1) specified, >>>>>> using the 2nd order central difference approximation. The parallel >>>>>> version >>>>>> of the code runs significantly slower than the serial version. It would >>>>>> be >>>>>> nice if someone could point out ways to improve this and/or suggest an >>>>>> alternative efficient version. >>>>>> >>>>>> function laplace_1D_serial(u::Array{Float64}) >>>>>> N = length(u) - 2 >>>>>> u_new = zeros(N) >>>>>> >>>>>> for i = 1:N >>>>>> u_new[i] = 0.5(u[i] + u[i + 2]) >>>>>> end >>>>>> >>>>>> u_new >>>>>> end >>>>>> >>>>>> function serial_iterate(u::Array{Float64}) >>>>>> u_new = laplace_1D_serial(u) >>>>>> >>>>>> for i = 1:length(u_new) >>>>>> u[i + 1] = u_new[i] >>>>>> end >>>>>> end >>>>>> >>>>>> function parallel_iterate(u::DArray) >>>>>> DArray(size(u), procs(u)) do I >>>>>> J = I[1] >>>>>> >>>>>> if myid() == 2 >>>>>> local_array = zeros(length(J) + 1) >>>>>> for i = J[1] : J[end] + 1 >>>>>> local_array[i - J[1] + 1] = u[i] >>>>>> end >>>>>> append!([float(u[1])], laplace_1D_serial(local_array)) >>>>>> >>>>>> elseif myid() == length(procs(u)) + 1 >>>>>> local_array = zeros(length(J) + 1) >>>>>> for i = J[1] - 1 : J[end] >>>>>> local_array[i - J[1] + 2] = u[i] >>>>>> end >>>>>> append!(laplace_1D_serial(local_array), [float(u[end])]) >>>>>> >>>>>> else >>>>>> local_array = zeros(length(J) + 2) >>>>>> for i = J[1] - 1 : J[end] + 1 >>>>>> local_array[i - J[1] + 2] = u[i] >>>>>> end >>>>>> laplace_1D_serial(local_array) >>>>>> >>>>>> end >>>>>> end >>>>>> end >>>>>> >>>>>> A sample run on my laptop with 4 processors: >>>>>> julia> u = zeros(1000); u[end] = 1.0; u_distributed = distribute(u); >>>>>> >>>>>> julia> @time for i = 1:1000 >>>>>> serial_iterate(u) >>>>>> end >>>>>> elapsed time: 0.011452192 seconds (8300112 bytes allocated) >>>>>> >>>>>> julia> @time for i = 1:1000 >>>>>> u_distributed = parallel_iterate(u_distributed) >>>>>> end >>>>>> elapsed time: 4.461922218 seconds (190565036 bytes allocated, 10.17% >>>>>> gc time) >>>>>> >>>>>> Thanks for your help! >>>>>> >>>>>> Cheers, >>>>>> Amuthan >>>>>> >>>>>> >>>>>> >>>> >>> >> >
