I'm still having problems understanding the basic concepts of
parallelization in Julia. It seems to me that the examples in the
documentation and those that I found elsewhere on the web don't really
reflect my usage case, so I'm wondering whether I'm approaching the problem
from the right angle. I've written a short piece of code that illustrates
what I'm trying to do; basically it's a large number of small calculations,
the results of which have to be stored in one large matrix.
Here's the example:
addprocs(3)
agrid = linspace(1,4,4)
bgrid = linspace(-1.05, 1.05, 30)
cgrid = linspace(-0.1, 0.1, 40)
dgrid = linspace(0.5, 1000, 40)
result = SharedArray(Float64, (size(agrid,1), size(bgrid,1), size(cgrid,1),
size(dgrid,1)), pids=procs())
@everywhere function calculate(a,b,c,d)
quadgk(cos, -b*10π, c*10π)[1] + quadgk(sin, -b*10π, c*10π)[1]*d
end
function solveall()
for a = 1:length(agrid)
for b = 1:length(bgrid)
for c = 1:length(cgrid)
@parallel for d = 1:length(dgrid)
result[a,b,c,d] = calculate(agrid[a], bgrid[b], cgrid[c],
dgrid[d])
end
end
end
end
return result
end
@time solveall()
Unfortunately, the speedup from parallelizing the inner loop isn't great
(going from ~9s to ~7.5s on my machine), so I'm wondering whether this is
actually the best way of implementing the parallelization. My originaly
idea was to somehow parallelize the outer loop, so that each processor
returns a 30x40x40 array, but I don't see how I can get the worker
processors to run the inner loops correctly.
Any input would be greatly appreciated, as I've been tyring to parallelize
this for a while and seem to be at a point where I'm just getting more
confused now the harder I try.