I have a Markov Chain Monte Carlo algorithm.
Unlike most MCMC algorithms each state is expensive to compute, and each
sample can be generated independently. Because evaluating the markov chain
is vey cheap, this means that I should be able to be almost embarassingly
parallel.
My current implementation is shown below. The idea is that
propose_parallel_tl generates a stack of boxes. The main loop takes item
pops items of this stack and when it becomes empty issues all the workers
to do their job in replenishing the stack.
Unfortunately it assumes that each worker takes roughly the same amount of
time. Additionally the [fetch(s) for s in spawns] means that it will wake
for first worker to finish I presume.
How can I can get around this? I effectively want all my workers to be
working and just a single processor going through and taking new proposals
from the workers. Since evaluating the chain itself is quite expensive, it
would be a waste if a physical core was dedicated entirely to this (I'm not
sure if the operating system would sort this out for me to)
# Propose boxes in parallel
function propose_parallel_tl(f,Y,X,stack; ncores = 1, args...)
ncores = min(ncores, nprocs())
println("Using $ncores cores")
if isempty(stack)
spawns = [@spawn proposebox_tl(f,Y,X;args...) for i = 1:ncores]
boxes = [fetch(s) for s in spawns]
push!(stack,boxes...)
end
return pop!(stack)
end
function pre_tlmh{D <: Domain} (f::Callable, Y, X::D, niters; args...)
boxes = D[]
stack = (D,Float64,Float64)[] #For parallelism
box, logq, prevolfrac = propose_parallel_tl(f,Y,X,stack; args...)
push!(boxes,box)
naccepted = 0; nsteps = 0
while nsteps < niters - 1
nextbox, nextlogq, prevolfrac = propose_parallel_tl(f,Y,X,stack; args
...)
nextlogp = logmeasure(nextbox) + log(prevolfrac)
...
nsteps += 1
end
boxes
end