@parallel is efficient at executing a large number of small computations, while pmap is better for a small number of complex computations.
What is happening in the mypmap case is that remotecall_fetch is being called 10000 times, i.e., 10000 roundtrips to other processors. Not very efficient. I would also point out that in your particular example of @parallel (+), you will find that sum(a::AbstractArray) is much faster than @parallel. However, say you wanted to initialize the shared array in parallel, then you would find that s=SharedArray(Int, 10^8) @parallel for i in 1:10^8 s[i] = rand(1:10) end quite efficiently use all workers in initializing the shared array. On Sun, Jan 26, 2014 at 12:55 PM, Madeleine Udell <[email protected] > wrote: > When using SharedArrays with pmap, I'm getting an increase in memory usage > and time proportional to the number of tasks. This doesn't happen when > using @parallel. What's the right way to pass shared arrays to workers > using functional syntax? > > (code for file q3.jl pasted below and also attached; the first timing > result refers to a @parallel implementation, the second to a pmap-style > implementation) > > ᐅ julia -p 10 q3.jl 100 > elapsed time: 1.14932906 seconds (12402424 bytes allocated) > elapsed time: 0.097900614 seconds (2716048 bytes allocated) > ᐅ julia -p 10 q3.jl 1000 > elapsed time: 1.140016584 seconds (12390724 bytes allocated) > elapsed time: 0.302179888 seconds (21641260 bytes allocated) > ᐅ julia -p 10 q3.jl 10000 > elapsed time: 1.173121314 seconds (12402424 bytes allocated) > elapsed time: 2.429918636 seconds (197840960 bytes allocated) > > n = int(ARGS[1]) > arr = randn(n) > function make_shared(a::AbstractArray,pids=workers()) > sh = SharedArray(typeof(a[1]),size(a),pids=pids) > sh[:] = a[:] > return sh > end > arr = make_shared(arr) > tasks = 1:n > > @time begin > @parallel (+) for i in tasks > arr[i] > end > end > > @everywhere function f(task,arr) > arr[task] > end > function mypmap(f::Function, tasks, arr) > # if this resends the shared data every time, it shouldn't) > np = nprocs() # determine the number of processes available > n = length(tasks) > results = 0 > i = 1 > # function to produce the next work item from the queue. > # in this case it's just an index. > nextidx() = (idx=i; i+=1; idx) > @sync begin > for p=1:np > if p != myid() || np == 1 > @async begin > while true > idx = nextidx() > if idx > n > break > end > task = tasks[idx] > results += remotecall_fetch(p, f, task, arr) > end > end > end > end > end > results > end > > @time mypmap(f,tasks,arr) >
