Thanks, that makes sense. After decoupling the parameters in my code, I see that the increase in memory is proportional to the number of tasks, not the size of the shared memory block.
On Sunday, January 26, 2014 1:28:18 AM UTC-8, Amit Murthy wrote: > > @parallel is efficient at executing a large number of small computations, > while pmap is better for a small number of complex computations. > > What is happening in the mypmap case is that remotecall_fetch is being > called 10000 times, i.e., 10000 roundtrips to other processors. Not very > efficient. > > I would also point out that in your particular example of @parallel (+), > you will find that sum(a::AbstractArray) is much faster than @parallel. > > However, say you wanted to initialize the shared array in parallel, then > you would find that > > s=SharedArray(Int, 10^8) > @parallel for i in 1:10^8 > s[i] = rand(1:10) > end > > quite efficiently use all workers in initializing the shared array. > > > > > > > On Sun, Jan 26, 2014 at 12:55 PM, Madeleine Udell > <[email protected]<javascript:> > > wrote: > >> When using SharedArrays with pmap, I'm getting an increase in memory >> usage and time proportional to the number of tasks. This doesn't happen >> when using @parallel. What's the right way to pass shared arrays to workers >> using functional syntax? >> >> (code for file q3.jl pasted below and also attached; the first timing >> result refers to a @parallel implementation, the second to a pmap-style >> implementation) >> >> ᐅ julia -p 10 q3.jl 100 >> elapsed time: 1.14932906 seconds (12402424 bytes allocated) >> elapsed time: 0.097900614 seconds (2716048 bytes allocated) >> ᐅ julia -p 10 q3.jl 1000 >> elapsed time: 1.140016584 seconds (12390724 bytes allocated) >> elapsed time: 0.302179888 seconds (21641260 bytes allocated) >> ᐅ julia -p 10 q3.jl 10000 >> elapsed time: 1.173121314 seconds (12402424 bytes allocated) >> elapsed time: 2.429918636 seconds (197840960 bytes allocated) >> >> n = int(ARGS[1]) >> arr = randn(n) >> function make_shared(a::AbstractArray,pids=workers()) >> sh = SharedArray(typeof(a[1]),size(a),pids=pids) >> sh[:] = a[:] >> return sh >> end >> arr = make_shared(arr) >> tasks = 1:n >> >> @time begin >> @parallel (+) for i in tasks >> arr[i] >> end >> end >> >> @everywhere function f(task,arr) >> arr[task] >> end >> function mypmap(f::Function, tasks, arr) >> # if this resends the shared data every time, it shouldn't) >> np = nprocs() # determine the number of processes available >> n = length(tasks) >> results = 0 >> i = 1 >> # function to produce the next work item from the queue. >> # in this case it's just an index. >> nextidx() = (idx=i; i+=1; idx) >> @sync begin >> for p=1:np >> if p != myid() || np == 1 >> @async begin >> while true >> idx = nextidx() >> if idx > n >> break >> end >> task = tasks[idx] >> results += remotecall_fetch(p, f, task, arr) >> end >> end >> end >> end >> end >> results >> end >> >> @time mypmap(f,tasks,arr) >> > >
