There is a pattern here. For a set of pids, the cumulative sum is 40 milliseconds. In a SharedArray, RemoteRefs are maintained on the creating pid (in this case 1) to the shmem mappings on each of the workers. I think they are referring back to pid 1 to fetch the local mapping when the shared array object is passed in the remotecall_fetch call, and hence all the workers are stuck on pid 1 becoming free to service these calls.
On Thu, Mar 27, 2014 at 5:58 PM, Amit Murthy <[email protected]> wrote: > Some more weirdness > > Starting with julia -p 8 > > A=Base.shmem_fill(1, (1000,1000)) > > Using 2 workers: > for i in 1:100 > t1 = time(); p=2+(i%2); remotecall_fetch(p, x->1, A); t2=time(); > println("@ $p ", int((t2-t1) * 1000)) > end > > prints > > ... > @ 3 8 > @ 2 32 > @ 3 8 > @ 2 32 > @ 3 8 > @ 2 32 > @ 3 8 > @ 2 32 > > > Notice that pid 2 always takes 32 milliseconds while pid 3 always takes 8 > > > > With 4 workers: > > for i in 1:100 > t1 = time(); p=2+(i%4); remotecall_fetch(p, x->1, A); t2=time(); > println("@ $p ", int((t2-t1) * 1000)) > end > > ... > @ 2 31 > @ 3 4 > @ 4 4 > @ 5 1 > @ 2 31 > @ 3 4 > @ 4 4 > @ 5 1 > @ 2 31 > @ 3 4 > @ 4 4 > @ 5 1 > @ 2 31 > > > Now pid 2 always takes 31 millisecs, pids 3&4, 4 and pid 5 1 millisecond > > With 8 workers: > > for i in 1:100 > t1 = time(); p=2+(i%8); remotecall_fetch(p, x->1, A); t2=time(); > println("@ $p ", int((t2-t1) * 1000)) > end > > .... > @ 2 20 > @ 3 4 > @ 4 1 > @ 5 3 > @ 6 4 > @ 7 1 > @ 8 2 > @ 9 4 > @ 2 20 > @ 3 4 > @ 4 1 > @ 5 3 > @ 6 4 > @ 7 1 > @ 8 2 > @ 9 4 > @ 2 20 > @ 3 4 > @ 4 1 > @ 5 3 > @ 6 4 > @ 7 1 > @ 8 3 > @ 9 4 > @ 2 20 > @ 3 4 > @ 4 1 > @ 5 3 > @ 6 4 > > > pid 2 is always 20 milliseconds while the rest are pretty consistent too. > > Any explanations? > > > > > > > > On Thu, Mar 27, 2014 at 5:24 PM, Amit Murthy <[email protected]>wrote: > >> I think the code does not do what you want. >> >> In the non-shared case you are sending a 10^6 integer array over the >> network 1000 times and summing it as many times. Most of the time is the >> network traffic time. Reduce 'n' to say 10, and you will what I mean >> >> In the shared case you are not sending the array over the network but >> still summing the entire array 1000 times. Some of the remotecall_fetch >> calls seems to be taking 40 milli seconds extra time which adds to the >> total. >> >> shared time of 6 seconds being less than the 15 seconds for non-shared >> seems to be just incidental. >> >> I don't yet have an explanation for the extra 40 millseconds per >> remotecall_fetch (for some calls only) in the shared case. >> >> >> >> >> >> >> On Thu, Mar 27, 2014 at 2:50 PM, Mikael Simberg <[email protected]>wrote: >> >>> Hi, >>> I'm having some trouble figuring out exactly how I'm supposed to use >>> SharedArrays - I might just be misunderstanding them or else something >>> odd is happening with them. >>> >>> I'm trying to do some parallel computing which looks a bit like this >>> test case: >>> >>> function createdata(shared) >>> const n = 1000 >>> if shared >>> A = SharedArray(Uint, (n, n)) >>> else >>> A = Array(Uint, (n, n)) >>> end >>> for i = 1:n, j = 1:n >>> A[i, j] = rand(Uint) >>> end >>> >>> return n, A >>> end >>> >>> function mainfunction(r; shared = false) >>> n, A = createdata(shared) >>> >>> i = 1 >>> nextidx() = (idx = i; i += 1; idx) >>> >>> @sync begin >>> for p in workers() >>> @async begin >>> while true >>> idx = nextidx() >>> if idx > r >>> break >>> end >>> found, s = remotecall_fetch(p, parfunction, n, A) >>> end >>> end >>> end >>> end >>> end >>> >>> function parfunction(n::Int, A::Array{Uint, 2}) >>> # possibly do some other computation here independent of shared >>> arrays >>> s = sum(A) >>> return false, s >>> end >>> >>> function parfunction(n::Int, A::SharedArray{Uint, 2}) >>> s = sum(A) >>> return false, s >>> end >>> >>> If I then start julia with e.g. two worker processes, so julia -p 2, the >>> following happens: >>> >>> julia> require("testpar.jl") >>> >>> julia> @time mainfunction(1000, shared = false) >>> elapsed time: 15.717117365 seconds (8448701068 bytes allocated) >>> >>> julia> @time mainfunction(1000, shared = true) >>> elapsed time: 6.068758627 seconds (56713996 bytes allocated) >>> >>> julia> rmprocs([2, 3]) >>> :ok >>> >>> julia> @time mainfunction(1000, shared = false) >>> elapsed time: 0.717638344 seconds (40357664 bytes allocated) >>> >>> julia> @time mainfunction(1000, shared = true) >>> elapsed time: 0.702174085 seconds (32680628 bytes allocated) >>> >>> So, with a normal array it's slow as expected, and it is faster with the >>> shared array, but what seems to happen is that with the normal array cpu >>> usage is 100 % on two cores but with the shared array cpu usage spikes >>> for a fraction of a second and then for the remaining nearly 6 seconds >>> it's at around 10 %. Can anyone reproduce this? Am I just doing >>> something wrong with shared arrays. >>> >>> Slightly related note: is there now a way to create a random shared >>> array? https://github.com/JuliaLang/julia/pull/4939 and the latest docs >>> don't mention this. >>> >> >> >
