[julia-users] Re: moving data to workers for distributed workloads

Michael Eastwood Thu, 19 May 2016 12:19:36 -0700

Hi Fabian,

It looks like that SO answer is moving data into the global scope of each 
worker. It is probably worth experimenting with but I'd be worried about 
performance implications of non-const global variables. It's probably the 
case that this is still a win for my use case though. Thanks for the link.


I'm using alm2map and map2alm from LibHealpix.jl (
https://github.com/mweastwood/LibHealpix.jl) to do the spherical harmonic 
transforms.

Michael

On Thursday, May 19, 2016 at 12:45:05 AM UTC-7, Fabian Gans wrote:
>
> Hi Michael, 
>
> I recently had a similar problem and this SO thread helped me a lot: 
> http://stackoverflow.com/questions/27677399/julia-how-to-copy-data-to-another-processor-in-julia
>
> As a side question: Which code are you using to calculate spherical 
> harmonics transforms. I was looking for a julia package some time ago and 
> did not find any, so if your code is publicly available, could you point me 
> to it? 
>
> Thanks
> Fabian
>
>
>
> On Wednesday, May 18, 2016 at 9:28:34 PM UTC+2, Michael Eastwood wrote:
>>
>> Hi julia-users!
>>
>> I need some help with distributing a workload across tens of workers 
>> across several machines.
>>
>> The problem I am trying to solve involves calculating the elements of a 
>> large block diagonal matrix. The total size of the blocks is >500 GB, so I 
>> cannot store the entire thing in memory. The way the calculation works, I 
>> do a bunch of spherical harmonic transforms and the results give me one row 
>> in each block of the matrix.
>>
>> The following code illustrates what I am doing currently. I am 
>> distributing the spherical harmonic transforms amongst all the workers and 
>> bringing the data back to the master process to write the results to disk 
>> (the master process has each matrix block mmapped to disk).
>>
>>     idx = 1
>>     limit = 10000
>>     nextidx() = (myidx = idx; idx += 1; myidx)
>>     @sync for worker in workers()
>>         @async while true
>>             myidx = nextidx()
>>             myidx ≤ limit || break
>>             coefficients = 
>> remotecall_fetch(spherical_harmonic_transforms, worker, input[myidx])
>>             write_results_to_disk(coefficients)
>>         end
>>     end
>>
>> Each spherical harmonic transform takes O(10 seconds) so I thought the 
>> data movement cost would be negligible compared to this. However, if I have 
>> three machines each with 16 workers, machine 1 will have all 16 workers 
>> working hard (the master process is on machine 1) and machines 2&3 will 
>> have most of their workers idling. My hypothesis is that the cost of moving 
>> the data to and from the workers is preventing machines 2&3 from being 
>> fully utilized.
>>
>> coefficients is a vector of a million Complex128s (16 MB)
>> input is composed of two parts: 1) a vector of 10 million Float64s (100 
>> MB) and 2) a small amount of additional information that is negligibly 
>> small compared to the first part.
>>
>> The trick is that the first part of input (the 100 MB vector) doesn't 
>> change between iterations. So I could alleviate most of the data movement 
>> problem by moving that part to each worker once. Problem is that I can't 
>> seem to figure out how to do that. The manual (
>> http://docs.julialang.org/en/release-0.4/manual/parallel-computing/#remoterefs-and-abstractchannels)
>>  
>> is a little thin on how to use RemoteRefs.
>>
>> So how do you move data to workers in a way that it can be re-used on 
>> subsequent iterations? An example in the manual would be very helpful!
>>
>> Thanks,
>> Michael
>>
>

[julia-users] Re: moving data to workers for distributed workloads

Reply via email to