You may find the "more extended and complex" example informative:
http://docs.julialang.org/en/latest/manual/parallel-computing/#shared-arrays

You'd need to set something up more complex for load-balancing, but, e.g., 
`assignment = SharedArray(Int, 2, nworkers())` would be a nice tool (worker i 
takes a chunk of indexes from assignment[1:i]:assignment[2:i]).

--Tim

On Thursday, August 27, 2015 09:40:33 PM Greg Plowman wrote:
> Firstly, I hope it's OK to revive this thread.
> 
> @parallel doesn't do dynamic load balancing.
> So I'm still interested in a pmap_reduce() or equivalent which does dynamic
> allocation to workers AND reduces on the fly.
> 
> Just to be clear:
> The "chunks" of work given to each worker might vary in execution time.
> The workers might vary in their execution speed.
> 
> It's difficult to optimally split job into chunks to assign to workers
> upfront.
> 
> Although pmap does assign chunks of work dynamically, it's still not
> optimal if the number of chunks is of the same order of the number of
> workers.
> It's very possible a slow worker will be left running a long "chunk". Or
> simply that only some workers will be required to finish any remaining
> chunks, leaving other workers idle.
> 
> It seems more granularity may help. i.e. Splitting job into many chunks so
> that #chunks >> #workers.
> But then pmap becomes unviable because it returns an array of results for
> each chunk, which requires too much memory.
> I don't need the intermediate results, so I could reduce on the fly.
> 
> Hence my interest in a version of pmap which returns the final reduced
> result only.
> Is it possible to comment on my original question about a modified pmap?
> 
> Or have I misunderstood pmap &/or @parallel? Or are there other options?
> 
> Thanks, Greg
> 
> On Friday, July 17, 2015 at 3:09:03 PM UTC+10, Greg Plowman wrote:
> > Thanks Jameson.
> > 
> > My error was from the line: trialCounts = MySimulation(trial, numIter)
> > Error message: trialCounts not defined
> > 
> > This had something to do with the variable name (used later, perhaps if
> > statements guarding scope and now local)
> > In any case, if I change variable name, it works: counts = MySimulation(
> > trial, numIter)
> > 
> > 
> > Thanks again for your help.
> > 
> > Greg
> > 
> > 
> > 
> > On Friday, July 17, 2015 at 2:48:44 PM UTC+10, Jameson wrote:
> > 
> > i believe that length(chunks) will be <= nworkers()
> > 
> > the last statement of the for loop should be the "return" value from that
> > iteration. (for example: the variable name `trialCount`).
> > 
> > On Fri, Jul 17, 2015 at 12:12 AM Greg Plowman <[email protected]> wrote:
> > 
> > OK thanks.
> > I didn't consider @parallel (probably because I considered it for only
> > large trials of small work units, whereas I considered pmap more suited to
> > relatively small trials of longer running work units)
> > In any case, @parallel works fine.
> > 
> > Old pmap code skeleton:
> > trialCounts = pmap(MySimulation, [1:numTrials], fill(numIter, numTrials))
> > totalCounts = sum(trialCounts)
> > 
> > New @parallel code
> > totalCounts = @parallel (+) for trial = 1:numTrials
> > 
> >     MySimulation(trial, numIter)
> > 
> > end
> > 
> > 
> > 
> > However, I have 2 questions:
> > 
> > 1. When I try to modify @parallel code to assign the result to a variable
> > inside the loop, I get an error.
> > I don't understand the @parallel macro, but I'm guessing I can't assign to
> > variable inside loop?
> > 
> > totalCounts = @parallel (+) for trial = 1:numTrials
> > 
> >     trialCount = MySimulation(trial, numPlays)
> >     print(trialCount) # or some other processing with trialCount
> > 
> > end
> > 
> > 
> > 
> > 2. Again I don't @parallel macro but it seems to call preduce (see below),
> > which seems to collect results in an array of size numTrials / nworkers().
> > If this is so, then memory requirement still has a dependency on the
> > number of trials.
> > I was trying to limit the results array to the number of workers,
> > independent of number of trials.
> > Is my understanding here correct?
> > 
> > function preduce(reducer, f, N::Int)
> > 
> >     chunks = splitrange(N, nworkers())
> >     results = cell(length(chunks))
> >     for i in 1:length(chunks)
> >     
> >         results[i] = @spawn f(first(chunks[i]), last(chunks[i]))
> >     
> >     end
> >     mapreduce(fetch, reducer, results)
> > 
> > end
> > 
> > 
> > 
> > Greg
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > On Friday, July 10, 2015 at 12:24:16 PM UTC+10, Jameson wrote:
> > 
> > this sounds like you may be looking for the `@parallel reduce_fn for itm =
> > lst; f(itm); end` map-reducer construct (described on the same page)?
> > 
> > On Thu, Jul 9, 2015 at 9:23 PM Greg Plowman <[email protected]> wrote:
> > 
> > I have been using pmap for simulations and find it very useful
> > and convenient.
> > However, sometimes I want to run a large number of trials where the
> > results are also large. This requires a lot of memory to hold the returned
> > results.
> > If I'm only interested the final, reduced result, and not concerned with
> > the raw individual trial results, then returning entire array seems
> > unnecessary.
> > I want to reduce on the fly, avoiding the need to keep all trial results.
> > I want to run more trials than workers for load balancing. (And possibly
> > because I'm interested in summary results of individual trials, not the
> > entire raw results).
> > 
> > With the help of the simplified version of pmap presented in the docs (
> > http://julia.readthedocs.org/en/latest/manual/parallel-computing/), I
> > have a tenuous understanding of how pmap works. Although the actual
> > implementation scares me.
> > In any case, I was wondering before I progress further, whether a modified
> > version of pmap could be designed to reduce on-the-fly.
> > Here are some modifications to the simplified, documentation version.
> > Would something like this work? I'm worried about the shared updates to
> > final_result. Will these happen orderly? What else should I consider?
> > 
> > 
> > * function pmap(f, lst)
> > 
> > * function pmap_reduce(f, lst, reduce_fn)  # extra argument is reduce
> > function
> > 
> >     np = nprocs()  # determine the number of processes available
> >     n = length(lst)
> > 
> > *   results = cell(n)
> > *   results = cell(np)  # hold results for currently executing procs only
> > *   final_result = cell(1)  # holds the final, reduced result
> > 
> >     i = 1
> >     # function to produce the next work item from the queue.
> >     # in this case it's just an index.
> >     nextidx() = (idx=i; i+=1; idx)
> >     
> >     @sync begin
> >     
> >         for p=1:np
> >         
> >             if p != myid() || np == 1
> >             
> >                 @async begin
> >                 
> >                     while
> > 
> > ...

Reply via email to