Thanks, that's a great suggestion! Writing:

function solveall(agrid, bgrid, cgrid, dgrid)
  @sync @parallel for a = 1:length(agrid)
    ...
  end
  return result
end

@time solveall(agrid, bgrid, cgrid, dgrid)

Reduces the time to ~4.3s, just about half the time of the single core 
implementation!

On Friday, March 20, 2015 at 4:42:08 PM UTC, Patrick O'Leary wrote:
>
> Try making the grids formal arguments to solveall():
>
> function solveall(agrid, bgrid, cgrid, dgrid)
>    ...
> end
>
> @time solveall(agrid, bgrid, cgrid, dgrid)
>
> Then you should be able to switch the loop you're parallelizing over.
>
> You probably also need a @sync somewhere to ensure all the workers are 
> done before returning.
>
> On Friday, March 20, 2015 at 11:07:00 AM UTC-5, Nils Gudat wrote:
>>
>> I'm still having problems understanding the basic concepts of 
>> parallelization in Julia. It seems to me that the examples in the 
>> documentation and those that I found elsewhere on the web don't really 
>> reflect my usage case, so I'm wondering whether I'm approaching the problem 
>> from the right angle. I've written a short piece of code that illustrates 
>> what I'm trying to do; basically it's a large number of small calculations, 
>> the results of which have to be stored in one large matrix.
>> Here's the example:
>>
>> addprocs(3)
>>
>> agrid = linspace(1,4,4)
>> bgrid = linspace(-1.05, 1.05, 30)
>> cgrid = linspace(-0.1, 0.1, 40)
>> dgrid = linspace(0.5, 1000, 40)
>>
>> result = SharedArray(Float64, (size(agrid,1), size(bgrid,1), 
>> size(cgrid,1), size(dgrid,1)), pids=procs())
>>
>> @everywhere function calculate(a,b,c,d)
>>   quadgk(cos, -b*10π, c*10π)[1] + quadgk(sin, -b*10π, c*10π)[1]*d
>> end
>>
>> function solveall()
>>   for a = 1:length(agrid)
>>     for b = 1:length(bgrid)
>>       for c = 1:length(cgrid)
>>         @parallel for d = 1:length(dgrid)
>>           result[a,b,c,d] = calculate(agrid[a], bgrid[b], cgrid[c], 
>> dgrid[d])
>>         end
>>       end
>>     end
>>   end
>>   return result
>> end
>>
>> @time solveall()
>>
>> Unfortunately, the speedup from parallelizing the inner loop isn't great 
>> (going from ~9s to ~7.5s on my machine), so I'm wondering whether this is 
>> actually the best way of implementing the parallelization. My originaly 
>> idea was to somehow parallelize the outer loop, so that each processor 
>> returns a 30x40x40 array, but I don't see how I can get the worker 
>> processors to run the inner loops correctly.
>>
>> Any input would be greatly appreciated, as I've been tyring to 
>> parallelize this for a while and seem to be at a point where I'm just 
>> getting more confused now the harder I try.
>>
>

Reply via email to