Re: [julia-users] How does distribute send data to workers?

David C Cohen Tue, 17 Dec 2013 22:01:36 -0800


On Wednesday, December 18, 2013 4:34:48 AM UTC, Amit Murthy wrote:
>
> distribute creates a new DArray from a regular array. It does this by 
> allocating parts of the regular array on each of the workers. "
> remotecall_fetch(owner, ()->fetch(rr)[I...])" is the regular init 
> function passed to the DArray constructor. On each worker, it pulls in its 
> its part of the regular array.
>
> While the DArray itself is created in parallel, 1000 workers is a lot, 
> typically one would expect 1 worker per CPU core. Of course, if you do have 
> access to 1000 cores, it will be a good idea to try it out, though I 
> suspect, we may see other issues too as a result of it.
>


I assume this means that DArrays are not horizontally scalable. What other 
solutions are out there for julia when working with big data? Is julia 
itself considered a complete system when working with distributed data, or 
do people use julia on top of other frameworks?


>
> On Tue, Dec 17, 2013 at 5:13 PM, David C Cohen 
> <[email protected]<javascript:>
> > wrote:
>
>> Hi,
>>
>> According to 
>> this<https://github.com/JuliaLang/julia/blob/b4fa86124dd1cb298373c3bef3f98c060cbb19b8/base/darray.jl#L160-L167>,
>>  
>> distribute is defined as:
>>
>> function distribute(a::AbstractArray)
>>     owner = myid()
>>     rr = RemoteRef()
>>     put(rr, a)
>>     DArray(size(a)) do I
>>         remotecall_fetch(owner, ()->fetch(rr)[I...])
>>     end
>> end
>>
>>
>> I'm trying to find out how this sends data to the workers.
>>
>>    - Here rr is the remote reference to the local machine. Then put(rr, 
>>    a) sends array a to the local machine. That doesn't make sense. 
>>    - When, in whatever way that doesn't make sense to me, data of array 
>>    a is sent to workers, does the network io happen in parallel, or in 
>> series?
>>    - If we have 1000 workers, parallel data sending means network 
>>    overload, series data sending means taking a long time. What's a good way 
>>    of working with larger distributed arrays, or distributing an array over 
>>    many many workers?
>>    
>>
>

Re: [julia-users] How does distribute send data to workers?

Reply via email to