Thanks, that's a great suggestion! Writing:
function solveall(agrid, bgrid, cgrid, dgrid)
@sync @parallel for a = 1:length(agrid)
...
end
return result
end
@time solveall(agrid, bgrid, cgrid, dgrid)
Reduces the time to ~4.3s, just about half the time of the single core
implementation!
On Friday, March 20, 2015 at 4:42:08 PM UTC, Patrick O'Leary wrote:
>
> Try making the grids formal arguments to solveall():
>
> function solveall(agrid, bgrid, cgrid, dgrid)
> ...
> end
>
> @time solveall(agrid, bgrid, cgrid, dgrid)
>
> Then you should be able to switch the loop you're parallelizing over.
>
> You probably also need a @sync somewhere to ensure all the workers are
> done before returning.
>
> On Friday, March 20, 2015 at 11:07:00 AM UTC-5, Nils Gudat wrote:
>>
>> I'm still having problems understanding the basic concepts of
>> parallelization in Julia. It seems to me that the examples in the
>> documentation and those that I found elsewhere on the web don't really
>> reflect my usage case, so I'm wondering whether I'm approaching the problem
>> from the right angle. I've written a short piece of code that illustrates
>> what I'm trying to do; basically it's a large number of small calculations,
>> the results of which have to be stored in one large matrix.
>> Here's the example:
>>
>> addprocs(3)
>>
>> agrid = linspace(1,4,4)
>> bgrid = linspace(-1.05, 1.05, 30)
>> cgrid = linspace(-0.1, 0.1, 40)
>> dgrid = linspace(0.5, 1000, 40)
>>
>> result = SharedArray(Float64, (size(agrid,1), size(bgrid,1),
>> size(cgrid,1), size(dgrid,1)), pids=procs())
>>
>> @everywhere function calculate(a,b,c,d)
>> quadgk(cos, -b*10π, c*10π)[1] + quadgk(sin, -b*10π, c*10π)[1]*d
>> end
>>
>> function solveall()
>> for a = 1:length(agrid)
>> for b = 1:length(bgrid)
>> for c = 1:length(cgrid)
>> @parallel for d = 1:length(dgrid)
>> result[a,b,c,d] = calculate(agrid[a], bgrid[b], cgrid[c],
>> dgrid[d])
>> end
>> end
>> end
>> end
>> return result
>> end
>>
>> @time solveall()
>>
>> Unfortunately, the speedup from parallelizing the inner loop isn't great
>> (going from ~9s to ~7.5s on my machine), so I'm wondering whether this is
>> actually the best way of implementing the parallelization. My originaly
>> idea was to somehow parallelize the outer loop, so that each processor
>> returns a 30x40x40 array, but I don't see how I can get the worker
>> processors to run the inner loops correctly.
>>
>> Any input would be greatly appreciated, as I've been tyring to
>> parallelize this for a while and seem to be at a point where I'm just
>> getting more confused now the harder I try.
>>
>