OK thanks.
I didn't consider @parallel (probably because I considered it for only
large trials of small work units, whereas I considered pmap more suited to
relatively small trials of longer running work units)
In any case, @parallel works fine.
Old pmap code skeleton:
trialCounts = pmap(MySimulation, [1:numTrials], fill(numIter, numTrials))
totalCounts = sum(trialCounts)
New @parallel code
totalCounts = @parallel (+) for trial = 1:numTrials
MySimulation(trial, numIter)
end
However, I have 2 questions:
1. When I try to modify @parallel code to assign the result to a variable
inside the loop, I get an error.
I don't understand the @parallel macro, but I'm guessing I can't assign to
variable inside loop?
totalCounts = @parallel (+) for trial = 1:numTrials
trialCount = MySimulation(trial, numPlays)
print(trialCount) # or some other processing with trialCount
end
2. Again I don't @parallel macro but it seems to call preduce (see below),
which seems to collect results in an array of size numTrials / nworkers().
If this is so, then memory requirement still has a dependency on the number
of trials.
I was trying to limit the results array to the number of workers,
independent of number of trials.
Is my understanding here correct?
function preduce(reducer, f, N::Int)
chunks = splitrange(N, nworkers())
results = cell(length(chunks))
for i in 1:length(chunks)
results[i] = @spawn f(first(chunks[i]), last(chunks[i]))
end
mapreduce(fetch, reducer, results)
end
Greg
On Friday, July 10, 2015 at 12:24:16 PM UTC+10, Jameson wrote:
> this sounds like you may be looking for the `@parallel reduce_fn for itm =
> lst; f(itm); end` map-reducer construct (described on the same page)?
>
> On Thu, Jul 9, 2015 at 9:23 PM Greg Plowman <[email protected]
> <javascript:>> wrote:
>
>> I have been using pmap for simulations and find it very useful
>> and convenient.
>> However, sometimes I want to run a large number of trials where the
>> results are also large. This requires a lot of memory to hold the returned
>> results.
>> If I'm only interested the final, reduced result, and not concerned with
>> the raw individual trial results, then returning entire array seems
>> unnecessary.
>> I want to reduce on the fly, avoiding the need to keep all trial results.
>> I want to run more trials than workers for load balancing. (And possibly
>> because I'm interested in summary results of individual trials, not the
>> entire raw results).
>>
>> With the help of the simplified version of pmap presented in the docs (
>> http://julia.readthedocs.org/en/latest/manual/parallel-computing/), I
>> have a tenuous understanding of how pmap works. Although the actual
>> implementation scares me.
>> In any case, I was wondering before I progress further, whether a
>> modified version of pmap could be designed to reduce on-the-fly.
>> Here are some modifications to the simplified, documentation version.
>> Would something like this work? I'm worried about the shared updates to
>> final_result. Will these happen orderly? What else should I consider?
>>
>>
>> * function pmap(f, lst)
>>
>> * function pmap_reduce(f, lst, reduce_fn) # extra argument is reduce
>> function
>> np = nprocs() # determine the number of processes available
>> n = length(lst)
>>
>>
>> * results = cell(n)
>> * results = cell(np) # hold results for currently executing procs only
>> * final_result = cell(1) # holds the final, reduced result
>>
>> i = 1
>> # function to produce the next work item from the queue.
>> # in this case it's just an index.
>> nextidx() = (idx=i; i+=1; idx)
>>
>> @sync begin
>> for p=1:np
>> if p != myid() || np == 1
>> @async begin
>> while true
>> idx = nextidx()
>> if idx > n
>> break
>> end
>>
>> * results[idx] = remotecall_fetch(p, f, lst[idx])
>> * results[p] = remotecall_fetch(p, f, lst[idx]) #
>> return results into array indexed by proc
>> * reduce_fn(final_result, results[p]) # combine
>> results[p] into final_result using reduction function
>> end
>> end
>> end
>> end
>> end
>>
>> * results
>> * final_result # return reduced result
>> end
>>
>>
>>