You can have only two DArrays and use localpart() to get the local parts of the arrays on each worker and work off that.
With a single iteration the network overhead will be much more than any gains from distributed computation - it depends on the computation of course. Currently, DArrays work best if the distributed computation can work solely off localparts. An efficient means of setindex! on darrays is a TODO at this time. On Mon, Jan 5, 2015 at 12:34 PM, Amuthan <[email protected]> wrote: > Hi Amit: yes, the idea is to have just two DArrays, one each for the > previous and current iterations. I had some trouble assigning values > directly to a DArray (a setindex! error) and so had to write it like this. > Do you know any means around this? > > Btw, the parallel code runs slower than the serial version even for just > one iteration. > > On Sun, Jan 4, 2015 at 10:27 PM, Amit Murthy <[email protected]> > wrote: > >> As written, this is creating a 1000 DArrays. I think you intended to have >> only 2 of them and swap values in each iteration? >> >> >> On Sunday, 4 January 2015 11:07:47 UTC+5:30, Amuthan A. Ramabathiran >> wrote: >>> >>> Hello: I recently started exploring the parallel capabilities of Julia >>> and I need some help in understanding and improving the performance a very >>> elementary parallel code using DArrays (I use Julia >>> version 0.4.0-dev+2431). The code pasted below (based essentially on >>> plife.jl) solves u''(x) = 0, x \in [0,1] with u(0) and u(1) specified, >>> using the 2nd order central difference approximation. The parallel version >>> of the code runs significantly slower than the serial version. It would be >>> nice if someone could point out ways to improve this and/or suggest an >>> alternative efficient version. >>> >>> function laplace_1D_serial(u::Array{Float64}) >>> N = length(u) - 2 >>> u_new = zeros(N) >>> >>> for i = 1:N >>> u_new[i] = 0.5(u[i] + u[i + 2]) >>> end >>> >>> u_new >>> end >>> >>> function serial_iterate(u::Array{Float64}) >>> u_new = laplace_1D_serial(u) >>> >>> for i = 1:length(u_new) >>> u[i + 1] = u_new[i] >>> end >>> end >>> >>> function parallel_iterate(u::DArray) >>> DArray(size(u), procs(u)) do I >>> J = I[1] >>> >>> if myid() == 2 >>> local_array = zeros(length(J) + 1) >>> for i = J[1] : J[end] + 1 >>> local_array[i - J[1] + 1] = u[i] >>> end >>> append!([float(u[1])], laplace_1D_serial(local_array)) >>> >>> elseif myid() == length(procs(u)) + 1 >>> local_array = zeros(length(J) + 1) >>> for i = J[1] - 1 : J[end] >>> local_array[i - J[1] + 2] = u[i] >>> end >>> append!(laplace_1D_serial(local_array), [float(u[end])]) >>> >>> else >>> local_array = zeros(length(J) + 2) >>> for i = J[1] - 1 : J[end] + 1 >>> local_array[i - J[1] + 2] = u[i] >>> end >>> laplace_1D_serial(local_array) >>> >>> end >>> end >>> end >>> >>> A sample run on my laptop with 4 processors: >>> julia> u = zeros(1000); u[end] = 1.0; u_distributed = distribute(u); >>> >>> julia> @time for i = 1:1000 >>> serial_iterate(u) >>> end >>> elapsed time: 0.011452192 seconds (8300112 bytes allocated) >>> >>> julia> @time for i = 1:1000 >>> u_distributed = parallel_iterate(u_distributed) >>> end >>> elapsed time: 4.461922218 seconds (190565036 bytes allocated, 10.17% gc >>> time) >>> >>> Thanks for your help! >>> >>> Cheers, >>> Amuthan >>> >>> >>> >
