Re: [julia-users] Re: DArrays performance

Amit Murthy Mon, 05 Jan 2015 00:53:57 -0800

You can have only two DArrays and use localpart() to get the local parts of
the arrays on each worker and work off that.


With a single iteration the network overhead will be much more than any
gains from distributed computation - it depends on the computation of
course.

Currently, DArrays work best if the distributed computation can work solely
off localparts. An efficient means of setindex! on darrays is a TODO at
this time.

On Mon, Jan 5, 2015 at 12:34 PM, Amuthan <[email protected]> wrote:

> Hi Amit: yes, the idea is to have just two DArrays, one each for the
> previous and current iterations. I had some trouble assigning values
> directly to a DArray (a setindex! error) and so had to write it like this.
> Do you know any means around this?
>
> Btw, the parallel code runs slower than the serial version even for just
> one iteration.
>
> On Sun, Jan 4, 2015 at 10:27 PM, Amit Murthy <[email protected]>
> wrote:
>
>> As written, this is creating a 1000 DArrays. I think you intended to have
>> only 2 of them and swap values in each iteration?
>>
>>
>> On Sunday, 4 January 2015 11:07:47 UTC+5:30, Amuthan A. Ramabathiran
>> wrote:
>>>
>>> Hello: I recently started exploring the parallel capabilities of Julia
>>> and I need some help in understanding and improving the performance a very
>>> elementary parallel code using DArrays (I use Julia
>>> version 0.4.0-dev+2431). The code pasted below (based essentially on
>>> plife.jl) solves u''(x) = 0, x \in [0,1] with u(0) and u(1) specified,
>>> using the 2nd order central difference approximation. The parallel version
>>> of the code runs significantly slower than the serial version. It would be
>>> nice if someone could point out ways to improve this and/or suggest an
>>> alternative efficient version.
>>>
>>> function laplace_1D_serial(u::Array{Float64})
>>>    N = length(u) - 2
>>>    u_new = zeros(N)
>>>
>>>    for i = 1:N
>>>       u_new[i] = 0.5(u[i] + u[i + 2])
>>>    end
>>>
>>>    u_new
>>> end
>>>
>>> function serial_iterate(u::Array{Float64})
>>>    u_new = laplace_1D_serial(u)
>>>
>>>    for i = 1:length(u_new)
>>>       u[i + 1] = u_new[i]
>>>    end
>>> end
>>>
>>> function parallel_iterate(u::DArray)
>>>    DArray(size(u), procs(u)) do I
>>>       J = I[1]
>>>
>>>       if myid() == 2
>>>          local_array = zeros(length(J) + 1)
>>>          for i = J[1] : J[end] + 1
>>>             local_array[i - J[1] + 1] = u[i]
>>>          end
>>>          append!([float(u[1])], laplace_1D_serial(local_array))
>>>
>>>       elseif myid() == length(procs(u)) + 1
>>>          local_array = zeros(length(J) + 1)
>>>          for i = J[1] - 1 : J[end]
>>>             local_array[i - J[1] + 2] = u[i]
>>>          end
>>>          append!(laplace_1D_serial(local_array), [float(u[end])])
>>>
>>>       else
>>>          local_array = zeros(length(J) + 2)
>>>          for i = J[1] - 1 : J[end] + 1
>>>             local_array[i - J[1] + 2] = u[i]
>>>          end
>>>          laplace_1D_serial(local_array)
>>>
>>>       end
>>>    end
>>> end
>>>
>>> A sample run on my laptop with 4 processors:
>>> julia> u = zeros(1000); u[end] = 1.0; u_distributed = distribute(u);
>>>
>>> julia> @time for i = 1:1000
>>>          serial_iterate(u)
>>>        end
>>> elapsed time: 0.011452192 seconds (8300112 bytes allocated)
>>>
>>> julia> @time for i = 1:1000
>>>          u_distributed = parallel_iterate(u_distributed)
>>>        end
>>> elapsed time: 4.461922218 seconds (190565036 bytes allocated, 10.17% gc
>>> time)
>>>
>>> Thanks for your help!
>>>
>>> Cheers,
>>> Amuthan
>>>
>>>
>>>
>

Re: [julia-users] Re: DArrays performance

Reply via email to