[julia-users] Questions on parallelizing code - or how to deal with objects (and not just gathering data) in parallel.

Sleort Thu, 01 Sep 2016 08:24:36 -0700

Hi,

I am trying to figure out how to parallelize a slightly convoluted Monte 
Carlo simulation in Julia (0.4.6), but have a hard time figuring out the 
"best"/"recommended" way of doing it. The non-parallel program structure 
goes like this:


   1. Initialize a (large) Monte Carlo state object (of its own type), 
   which is going to be updated using a Markov Chain Monte Carlo update 
   algorithm. Say,
   x =MCState()
   In my case this is NOT an array, but a linked list/graph structure. The 
   state object also contains some parameters, to be iteratively determined.
   2. Do n Monte Carlo updates (which changes the state x) and gather some 
   data from this in a dataobject.
   for it=1:n
   doMCupdate!(x,dataobject)
   end
   3. Based on the gathered data, the parameters of the MC state should be 
   updated,
   updateparameters!(x,dataObject)
   4. Repeat from 2 until convergence by some measure.

*Ideally*, the parallel code should read something like this:

   1. Initialize a Monte Carlo state object on each worker. The state is 
   large (in memory), so it should not be copied/moved around between workers.
   2. Do independent Monte Carlo updates on each worker, collecting the 
   data in independent dataobjects.
   3. Gather all the relevant data of the dataobjects on the master 
   process. Calculate what the new parameters should be based on these 
   (compared to the non-parallel case, statistically improved) data. 
   Distribute these parameters back to the Monte Carlo state objects on each 
   worker process.
   4. Repeat from 2 until convergence by some measure.

The question is: What is the "best" way of accomplishing this in Julia? 

As long as the entire program is wrapped within the same function/global 
scope, the parallel case can be accomplished by the use of @everywhere, 
@parallel for, and @eval @everywhere x.parameters = $newparameters (for 
broadcasting the new parameters from the master to the workers). This 
however, results in a long, ugly code, which probably isn't very efficient 
from a compiler point of view. I would rather like to pass the parallel 
MCstate objects between the various steps in the algorithm, like in the 
non-parallel way. This could (should?) maybe be achieved with the use of 
RemoteRefs? However, RemoteRefs are references to results of a calculation 
rather than the objects on which the calculations are performed. The 
objects could of course be accessed by clever use of identity functions, 
the put() function etc., but again the approach seems rather 
inelegant/"hackish" to me...

To summarize/generalize: I'm wondering about how to deal with independent 
objects defined on each worker process. How to pass them between functions 
in parallel. How to gather information from them to the master process. How 
to broadcast information from the master to the workers... To me, my 
problem seems to be somewhat beyond the @parallel for, pmap and similar 
"distribute calculations and gather the result and that's it" approaches 
explained in the documentation and elsewhere. However, I'm sure there is a 
natural way to deal with it in Julia. After all, I'm trying to a achieve a 
rather generic parallel programming pattern.

Any suggestions/ideas are very welcome!

[julia-users] Questions on parallelizing code - or how to deal with objects (and not just gathering data) in parallel.

Reply via email to