This is why, with my original implementation of SharedArrays (oh-so-long-ago), 
I created pmap_bw, to do busy-wait on the return value of a SharedArray 
computation. The amusing part is that you can use a SharedArray to do the 
synchronization among processes.

--Tim

On Thursday, March 27, 2014 06:11:12 PM Amit Murthy wrote:
> There is a pattern here. For a set of pids, the cumulative sum is 40
> milliseconds. In a SharedArray, RemoteRefs are maintained on the creating
> pid (in this case 1) to the shmem mappings on each of the workers. I think
> they are referring back to pid 1 to fetch the local mapping when the shared
> array object is passed in the remotecall_fetch call, and hence all the
> workers are stuck on pid 1 becoming free to service these calls.
> 
> On Thu, Mar 27, 2014 at 5:58 PM, Amit Murthy <[email protected]> wrote:
> > Some more weirdness
> > 
> > Starting with julia -p 8
> > 
> > A=Base.shmem_fill(1, (1000,1000))
> > 
> > Using 2 workers:
> > for i in 1:100
> > 
> >          t1 = time(); p=2+(i%2); remotecall_fetch(p, x->1, A); t2=time();
> > 
> > println("@ $p ", int((t2-t1) * 1000))
> > end
> > 
> > prints
> > 
> > ...
> > @ 3 8
> > @ 2 32
> > @ 3 8
> > @ 2 32
> > @ 3 8
> > @ 2 32
> > @ 3 8
> > @ 2 32
> > 
> > 
> > Notice that pid 2 always takes 32 milliseconds while pid 3 always takes 8
> > 
> > 
> > 
> > With 4 workers:
> > 
> > for i in 1:100
> > 
> >          t1 = time(); p=2+(i%4); remotecall_fetch(p, x->1, A); t2=time();
> > 
> > println("@ $p ", int((t2-t1) * 1000))
> > end
> > 
> > ...
> > @ 2 31
> > @ 3 4
> > @ 4 4
> > @ 5 1
> > @ 2 31
> > @ 3 4
> > @ 4 4
> > @ 5 1
> > @ 2 31
> > @ 3 4
> > @ 4 4
> > @ 5 1
> > @ 2 31
> > 
> > 
> > Now pid 2 always takes 31 millisecs, pids 3&4, 4 and pid 5 1 millisecond
> > 
> > With 8 workers:
> > 
> > for i in 1:100
> > 
> >          t1 = time(); p=2+(i%8); remotecall_fetch(p, x->1, A); t2=time();
> > 
> > println("@ $p ", int((t2-t1) * 1000))
> > end
> > 
> > ....
> > @ 2 20
> > @ 3 4
> > @ 4 1
> > @ 5 3
> > @ 6 4
> > @ 7 1
> > @ 8 2
> > @ 9 4
> > @ 2 20
> > @ 3 4
> > @ 4 1
> > @ 5 3
> > @ 6 4
> > @ 7 1
> > @ 8 2
> > @ 9 4
> > @ 2 20
> > @ 3 4
> > @ 4 1
> > @ 5 3
> > @ 6 4
> > @ 7 1
> > @ 8 3
> > @ 9 4
> > @ 2 20
> > @ 3 4
> > @ 4 1
> > @ 5 3
> > @ 6 4
> > 
> > 
> > pid 2 is always 20 milliseconds while the rest are pretty consistent too.
> > 
> > Any explanations?
> > 
> > On Thu, Mar 27, 2014 at 5:24 PM, Amit Murthy <[email protected]>wrote:
> >> I think the code does not do what you want.
> >> 
> >> In the non-shared case you are sending a 10^6 integer array over the
> >> network 1000 times and summing it as many times. Most of the time is the
> >> network traffic time. Reduce 'n' to say 10, and you will what I mean
> >> 
> >> In the shared case you are not sending the array over the network but
> >> still summing the entire array 1000 times. Some of the remotecall_fetch
> >> calls seems to be taking 40 milli seconds extra time which adds to the
> >> total.
> >> 
> >> shared time of 6 seconds being less than the 15 seconds for non-shared
> >> seems to be just incidental.
> >> 
> >> I don't yet have an explanation for the extra 40 millseconds per
> >> remotecall_fetch (for some calls only) in the shared case.
> >> 
> >> On Thu, Mar 27, 2014 at 2:50 PM, Mikael Simberg 
<[email protected]>wrote:
> >>> Hi,
> >>> I'm having some trouble figuring out exactly how I'm supposed to use
> >>> SharedArrays - I might just be misunderstanding them or else something
> >>> odd is happening with them.
> >>> 
> >>> I'm trying to do some parallel computing which looks a bit like this
> >>> test case:
> >>> 
> >>> function createdata(shared)
> >>> 
> >>>     const n = 1000
> >>>     if shared
> >>>     
> >>>         A = SharedArray(Uint, (n, n))
> >>>     
> >>>     else
> >>>     
> >>>         A = Array(Uint, (n, n))
> >>>     
> >>>     end
> >>>     for i = 1:n, j = 1:n
> >>>     
> >>>         A[i, j] = rand(Uint)
> >>>     
> >>>     end
> >>>     
> >>>     return n, A
> >>> 
> >>> end
> >>> 
> >>> function mainfunction(r; shared = false)
> >>> 
> >>>     n, A = createdata(shared)
> >>>     
> >>>     i = 1
> >>>     nextidx() = (idx = i; i += 1; idx)
> >>>     
> >>>     @sync begin
> >>>     
> >>>         for p in workers()
> >>>         
> >>>             @async begin
> >>>             
> >>>                 while true
> >>>                 
> >>>                     idx = nextidx()
> >>>                     if idx > r
> >>>                     
> >>>                         break
> >>>                     
> >>>                     end
> >>>                     found, s = remotecall_fetch(p, parfunction, n, A)
> >>>                 
> >>>                 end
> >>>             
> >>>             end
> >>>         
> >>>         end
> >>>     
> >>>     end
> >>> 
> >>> end
> >>> 
> >>> function parfunction(n::Int, A::Array{Uint, 2})
> >>> 
> >>>     # possibly do some other computation here independent of shared
> >>>     arrays
> >>>     s = sum(A)
> >>>     return false, s
> >>> 
> >>> end
> >>> 
> >>> function parfunction(n::Int, A::SharedArray{Uint, 2})
> >>> 
> >>>     s = sum(A)
> >>>     return false, s
> >>> 
> >>> end
> >>> 
> >>> If I then start julia with e.g. two worker processes, so julia -p 2, the
> >>> following happens:
> >>> 
> >>> julia> require("testpar.jl")
> >>> 
> >>> julia> @time mainfunction(1000, shared = false)
> >>> elapsed time: 15.717117365 seconds (8448701068 bytes allocated)
> >>> 
> >>> julia> @time mainfunction(1000, shared = true)
> >>> elapsed time: 6.068758627 seconds (56713996 bytes allocated)
> >>> 
> >>> julia> rmprocs([2, 3])
> >>> 
> >>> :ok
> >>> 
> >>> julia> @time mainfunction(1000, shared = false)
> >>> elapsed time: 0.717638344 seconds (40357664 bytes allocated)
> >>> 
> >>> julia> @time mainfunction(1000, shared = true)
> >>> elapsed time: 0.702174085 seconds (32680628 bytes allocated)
> >>> 
> >>> So, with a normal array it's slow as expected, and it is faster with the
> >>> shared array, but what seems to happen is that with the normal array cpu
> >>> usage is 100 % on two cores but with the shared array cpu usage spikes
> >>> for a fraction of a second and then for the remaining nearly 6 seconds
> >>> it's at around 10 %. Can anyone reproduce this? Am I just doing
> >>> something wrong with shared arrays.
> >>> 
> >>> Slightly related note: is there now a way to create a random shared
> >>> array? https://github.com/JuliaLang/julia/pull/4939 and the latest docs
> >>> don't mention this.

Reply via email to