Suppose I want to implement a distributed topic model based on the
following pseudocode:
-----
Init:
Assign each document d to a processor p.
Initialize a few large global matrices M_g.
For each document, load a few read-only matrices M_dr and a few mutating
matrices M_dw.
Repeat:
For each processor p in parallel:
1. Copy the global matrices M_g, and call the copies M_gp.
2. For each document assigned to p, do something that reads from
{M_gp,M_dr,M_dw} and writes to {M_gp,M_dw}.
3. Synchronize.
4. Update M_g by doing some kind of reduce-step on the old value of M_g and
all the local copies of M_gp.
-----
What is best practice for a scheme like this?
At what level of granularity should I make the RemoteRefs?
- per processor, per document, per document/matrix, per
document/matrix/element?
How should the mutation be done, assuming I am only changing individual
elements of large matrices?
- if the refs were single elements, I would try put(r,update(take(r))),
but not sure how to do this if the refs are matrices
Can I use a "@parallel (custom_reduce_fn) for" loop to loop over documents?
Will it automatically assign each document to the process that contains the
most RemoteRef's that the body of the for loop accesses?
If I can't use this construct, do I need to construct an array of all the
RemoteRefs of the return values from spawning on each processor, and then
do a reduce myself?
-----
Thanks in advance for any advice offered. If I get an elegant version of
this running, I would be happy to make it into a mini parallelism-in-Julia
tutorial.
Daniel