Re: [julia-users] Re: DArrays performance

Amuthan Wed, 07 Jan 2015 09:29:47 -0800

Hi Andreas: this is very helpful, thanks!

On Wed, Jan 7, 2015 at 6:06 AM, Andreas Noack <[email protected]>
wrote:


> I made a plot that compares the time it takes to move an array with the
> one-sided methods we are using now and with MPI.jl. It is here
>
> https://github.com/JuliaLang/julia/issues/9167#issuecomment-64721543
>
> 2015-01-07 1:54 GMT-05:00 Amuthan <[email protected]>:
>
> Amit: Thanks for the suggestion. I gave it a quick try, but wasn't
>> successful. It appears to me that communication between the processors (to
>> obtain the boundary data) would require reconstructing the DArray from the
>> localparts at the end of each iteration. I guess I'll have to take a deeper
>> look into the implementation of DArrays to understand how best to implement
>> this.
>>
>> In the meantime, I got a reasonable speedup using the Julia wrapper for
>> MPI (https://github.com/JuliaParallel/MPI.jl). Has anyone tried
>> comparing the performance of the one-sided message passing model of DArray
>> and the standard (2-sided) MPI model?
>>
>> Amuthan
>>
>> On Mon, Jan 5, 2015 at 12:53 AM, Amit Murthy <[email protected]>
>> wrote:
>>
>>> You can have only two DArrays and use localpart() to get the local parts
>>> of the arrays on each worker and work off that.
>>>
>>> With a single iteration the network overhead will be much more than any
>>> gains from distributed computation - it depends on the computation of
>>> course.
>>>
>>> Currently, DArrays work best if the distributed computation can work
>>> solely off localparts. An efficient means of setindex! on darrays is a TODO
>>> at this time.
>>>
>>> On Mon, Jan 5, 2015 at 12:34 PM, Amuthan <[email protected]> wrote:
>>>
>>>> Hi Amit: yes, the idea is to have just two DArrays, one each for the
>>>> previous and current iterations. I had some trouble assigning values
>>>> directly to a DArray (a setindex! error) and so had to write it like this.
>>>> Do you know any means around this?
>>>>
>>>> Btw, the parallel code runs slower than the serial version even for
>>>> just one iteration.
>>>>
>>>> On Sun, Jan 4, 2015 at 10:27 PM, Amit Murthy <[email protected]>
>>>> wrote:
>>>>
>>>>> As written, this is creating a 1000 DArrays. I think you intended to
>>>>> have only 2 of them and swap values in each iteration?
>>>>>
>>>>>
>>>>> On Sunday, 4 January 2015 11:07:47 UTC+5:30, Amuthan A. Ramabathiran
>>>>> wrote:
>>>>>>
>>>>>> Hello: I recently started exploring the parallel capabilities of
>>>>>> Julia and I need some help in understanding and improving the 
>>>>>> performance a
>>>>>> very elementary parallel code using DArrays (I use Julia
>>>>>> version 0.4.0-dev+2431). The code pasted below (based essentially on
>>>>>> plife.jl) solves u''(x) = 0, x \in [0,1] with u(0) and u(1) specified,
>>>>>> using the 2nd order central difference approximation. The parallel 
>>>>>> version
>>>>>> of the code runs significantly slower than the serial version. It would 
>>>>>> be
>>>>>> nice if someone could point out ways to improve this and/or suggest an
>>>>>> alternative efficient version.
>>>>>>
>>>>>> function laplace_1D_serial(u::Array{Float64})
>>>>>>    N = length(u) - 2
>>>>>>    u_new = zeros(N)
>>>>>>
>>>>>>    for i = 1:N
>>>>>>       u_new[i] = 0.5(u[i] + u[i + 2])
>>>>>>    end
>>>>>>
>>>>>>    u_new
>>>>>> end
>>>>>>
>>>>>> function serial_iterate(u::Array{Float64})
>>>>>>    u_new = laplace_1D_serial(u)
>>>>>>
>>>>>>    for i = 1:length(u_new)
>>>>>>       u[i + 1] = u_new[i]
>>>>>>    end
>>>>>> end
>>>>>>
>>>>>> function parallel_iterate(u::DArray)
>>>>>>    DArray(size(u), procs(u)) do I
>>>>>>       J = I[1]
>>>>>>
>>>>>>       if myid() == 2
>>>>>>          local_array = zeros(length(J) + 1)
>>>>>>          for i = J[1] : J[end] + 1
>>>>>>             local_array[i - J[1] + 1] = u[i]
>>>>>>          end
>>>>>>          append!([float(u[1])], laplace_1D_serial(local_array))
>>>>>>
>>>>>>       elseif myid() == length(procs(u)) + 1
>>>>>>          local_array = zeros(length(J) + 1)
>>>>>>          for i = J[1] - 1 : J[end]
>>>>>>             local_array[i - J[1] + 2] = u[i]
>>>>>>          end
>>>>>>          append!(laplace_1D_serial(local_array), [float(u[end])])
>>>>>>
>>>>>>       else
>>>>>>          local_array = zeros(length(J) + 2)
>>>>>>          for i = J[1] - 1 : J[end] + 1
>>>>>>             local_array[i - J[1] + 2] = u[i]
>>>>>>          end
>>>>>>          laplace_1D_serial(local_array)
>>>>>>
>>>>>>       end
>>>>>>    end
>>>>>> end
>>>>>>
>>>>>> A sample run on my laptop with 4 processors:
>>>>>> julia> u = zeros(1000); u[end] = 1.0; u_distributed = distribute(u);
>>>>>>
>>>>>> julia> @time for i = 1:1000
>>>>>>          serial_iterate(u)
>>>>>>        end
>>>>>> elapsed time: 0.011452192 seconds (8300112 bytes allocated)
>>>>>>
>>>>>> julia> @time for i = 1:1000
>>>>>>          u_distributed = parallel_iterate(u_distributed)
>>>>>>        end
>>>>>> elapsed time: 4.461922218 seconds (190565036 bytes allocated, 10.17%
>>>>>> gc time)
>>>>>>
>>>>>> Thanks for your help!
>>>>>>
>>>>>> Cheers,
>>>>>> Amuthan
>>>>>>
>>>>>>
>>>>>>
>>>>
>>>
>>
>

Re: [julia-users] Re: DArrays performance

Reply via email to