Suppose I want to implement a distributed topic model based on the 
following pseudocode:

-----
Init:

Assign each document d to a processor p.
Initialize a few large global matrices M_g.
For each document, load a few read-only matrices M_dr and a few mutating 
matrices M_dw.

Repeat:
For each processor p in parallel:
1. Copy the global matrices M_g, and call the copies M_gp.
2. For each document assigned to p, do something that reads from 
{M_gp,M_dr,M_dw} and writes to {M_gp,M_dw}.
3. Synchronize.
4. Update M_g by doing some kind of reduce-step on the old value of M_g and 
all the local copies of M_gp.

-----

What is best practice for a scheme like this? 

At what level of granularity should I make the RemoteRefs? 
  - per processor, per document, per document/matrix, per 
document/matrix/element?

How should the mutation be done, assuming I am only changing individual 
elements of large matrices?
  - if the refs were single elements, I would try put(r,update(take(r))), 
but not sure how to do this if the refs are matrices

Can I use a "@parallel (custom_reduce_fn) for" loop to loop over documents? 
Will it automatically assign each document to the process that contains the 
most RemoteRef's that the body of the for loop accesses?

If I can't use this construct, do I need to construct an array of all the 
RemoteRefs of the return values from spawning on each processor, and then 
do a reduce myself?

-----

Thanks in advance for any advice offered. If I get an elegant version of 
this running, I would be happy to make it into a mini parallelism-in-Julia 
tutorial.

Daniel

Reply via email to