[julia-users] Re: Questions on parallelizing code - or how to deal with objects (and not just gathering data) in parallel.

Chris Rackauckas Thu, 01 Sep 2016 11:15:00 -0700

Hey,
  There are some things that are changed in v0.5 so I would suggest that 
you would start this part of the project on v0.5.


  That said, I think you have to build the tools yourself using the basic 
parallel macros. You might want to look into ParallelDataTransfer.jl 
<https://github.com/ChrisRackauckas/ParallelDataTransfer.jl>. It's built 
off a solution from StackExchange awhile ago, though there is a relevant 
bug you'll need to help us squash 
<https://github.com/ChrisRackauckas/ParallelDataTransfer.jl/issues/1>. 
Anything you find helpful in this area I would love to have as a 
contribution to this package. It would be helpful to the community to have 
a curated repository of these functions/macros.

On Thursday, September 1, 2016 at 8:24:21 AM UTC-7, Sleort wrote:
>
> Hi,
>
> I am trying to figure out how to parallelize a slightly convoluted Monte 
> Carlo simulation in Julia (0.4.6), but have a hard time figuring out the 
> "best"/"recommended" way of doing it. The non-parallel program structure 
> goes like this:
>
>    1. Initialize a (large) Monte Carlo state object (of its own type), 
>    which is going to be updated using a Markov Chain Monte Carlo update 
>    algorithm. Say,
>    x =MCState()
>    In my case this is NOT an array, but a linked list/graph structure. 
>    The state object also contains some parameters, to be iteratively 
>    determined.
>    2. Do n Monte Carlo updates (which changes the state x) and gather 
>    some data from this in a dataobject.
>    for it=1:n
>    doMCupdate!(x,dataobject)
>    end
>    3. Based on the gathered data, the parameters of the MC state should 
>    be updated,
>    updateparameters!(x,dataObject)
>    4. Repeat from 2 until convergence by some measure.
>
> *Ideally*, the parallel code should read something like this:
>
>    1. Initialize a Monte Carlo state object on each worker. The state is 
>    large (in memory), so it should not be copied/moved around between workers.
>    2. Do independent Monte Carlo updates on each worker, collecting the 
>    data in independent dataobjects.
>    3. Gather all the relevant data of the dataobjects on the master 
>    process. Calculate what the new parameters should be based on these 
>    (compared to the non-parallel case, statistically improved) data. 
>    Distribute these parameters back to the Monte Carlo state objects on each 
>    worker process.
>    4. Repeat from 2 until convergence by some measure.
>
> The question is: What is the "best" way of accomplishing this in Julia? 
>
> As long as the entire program is wrapped within the same function/global 
> scope, the parallel case can be accomplished by the use of @everywhere, 
> @parallel for, and @eval @everywhere x.parameters = $newparameters (for 
> broadcasting the new parameters from the master to the workers). This 
> however, results in a long, ugly code, which probably isn't very efficient 
> from a compiler point of view. I would rather like to pass the parallel 
> MCstate objects between the various steps in the algorithm, like in the 
> non-parallel way. This could (should?) maybe be achieved with the use of 
> RemoteRefs? However, RemoteRefs are references to results of a calculation 
> rather than the objects on which the calculations are performed. The 
> objects could of course be accessed by clever use of identity functions, 
> the put() function etc., but again the approach seems rather 
> inelegant/"hackish" to me...
>
> To summarize/generalize: I'm wondering about how to deal with independent 
> objects defined on each worker process. How to pass them between functions 
> in parallel. How to gather information from them to the master process. How 
> to broadcast information from the master to the workers... To me, my 
> problem seems to be somewhat beyond the @parallel for, pmap and similar 
> "distribute calculations and gather the result and that's it" approaches 
> explained in the documentation and elsewhere. However, I'm sure there is a 
> natural way to deal with it in Julia. After all, I'm trying to a achieve a 
> rather generic parallel programming pattern.
>
> Any suggestions/ideas are very welcome!
>

[julia-users] Re: Questions on parallelizing code - or how to deal with objects (and not just gathering data) in parallel.

Reply via email to