Hi Tim,

The issue of the extra 40 milliseconds is related to how RemoteRefs to the
individuals mappings are fetched. I don't quite get how pmap_bw is related
to this....


On Thu, Mar 27, 2014 at 6:27 PM, Tim Holy <[email protected]> wrote:

> This is why, with my original implementation of SharedArrays
> (oh-so-long-ago),
> I created pmap_bw, to do busy-wait on the return value of a SharedArray
> computation. The amusing part is that you can use a SharedArray to do the
> synchronization among processes.
>
> --Tim
>
> On Thursday, March 27, 2014 06:11:12 PM Amit Murthy wrote:
> > There is a pattern here. For a set of pids, the cumulative sum is 40
> > milliseconds. In a SharedArray, RemoteRefs are maintained on the creating
> > pid (in this case 1) to the shmem mappings on each of the workers. I
> think
> > they are referring back to pid 1 to fetch the local mapping when the
> shared
> > array object is passed in the remotecall_fetch call, and hence all the
> > workers are stuck on pid 1 becoming free to service these calls.
> >
> > On Thu, Mar 27, 2014 at 5:58 PM, Amit Murthy <[email protected]>
> wrote:
> > > Some more weirdness
> > >
> > > Starting with julia -p 8
> > >
> > > A=Base.shmem_fill(1, (1000,1000))
> > >
> > > Using 2 workers:
> > > for i in 1:100
> > >
> > >          t1 = time(); p=2+(i%2); remotecall_fetch(p, x->1, A);
> t2=time();
> > >
> > > println("@ $p ", int((t2-t1) * 1000))
> > > end
> > >
> > > prints
> > >
> > > ...
> > > @ 3 8
> > > @ 2 32
> > > @ 3 8
> > > @ 2 32
> > > @ 3 8
> > > @ 2 32
> > > @ 3 8
> > > @ 2 32
> > >
> > >
> > > Notice that pid 2 always takes 32 milliseconds while pid 3 always
> takes 8
> > >
> > >
> > >
> > > With 4 workers:
> > >
> > > for i in 1:100
> > >
> > >          t1 = time(); p=2+(i%4); remotecall_fetch(p, x->1, A);
> t2=time();
> > >
> > > println("@ $p ", int((t2-t1) * 1000))
> > > end
> > >
> > > ...
> > > @ 2 31
> > > @ 3 4
> > > @ 4 4
> > > @ 5 1
> > > @ 2 31
> > > @ 3 4
> > > @ 4 4
> > > @ 5 1
> > > @ 2 31
> > > @ 3 4
> > > @ 4 4
> > > @ 5 1
> > > @ 2 31
> > >
> > >
> > > Now pid 2 always takes 31 millisecs, pids 3&4, 4 and pid 5 1
> millisecond
> > >
> > > With 8 workers:
> > >
> > > for i in 1:100
> > >
> > >          t1 = time(); p=2+(i%8); remotecall_fetch(p, x->1, A);
> t2=time();
> > >
> > > println("@ $p ", int((t2-t1) * 1000))
> > > end
> > >
> > > ....
> > > @ 2 20
> > > @ 3 4
> > > @ 4 1
> > > @ 5 3
> > > @ 6 4
> > > @ 7 1
> > > @ 8 2
> > > @ 9 4
> > > @ 2 20
> > > @ 3 4
> > > @ 4 1
> > > @ 5 3
> > > @ 6 4
> > > @ 7 1
> > > @ 8 2
> > > @ 9 4
> > > @ 2 20
> > > @ 3 4
> > > @ 4 1
> > > @ 5 3
> > > @ 6 4
> > > @ 7 1
> > > @ 8 3
> > > @ 9 4
> > > @ 2 20
> > > @ 3 4
> > > @ 4 1
> > > @ 5 3
> > > @ 6 4
> > >
> > >
> > > pid 2 is always 20 milliseconds while the rest are pretty consistent
> too.
> > >
> > > Any explanations?
> > >
> > > On Thu, Mar 27, 2014 at 5:24 PM, Amit Murthy <[email protected]
> >wrote:
> > >> I think the code does not do what you want.
> > >>
> > >> In the non-shared case you are sending a 10^6 integer array over the
> > >> network 1000 times and summing it as many times. Most of the time is
> the
> > >> network traffic time. Reduce 'n' to say 10, and you will what I mean
> > >>
> > >> In the shared case you are not sending the array over the network but
> > >> still summing the entire array 1000 times. Some of the
> remotecall_fetch
> > >> calls seems to be taking 40 milli seconds extra time which adds to the
> > >> total.
> > >>
> > >> shared time of 6 seconds being less than the 15 seconds for non-shared
> > >> seems to be just incidental.
> > >>
> > >> I don't yet have an explanation for the extra 40 millseconds per
> > >> remotecall_fetch (for some calls only) in the shared case.
> > >>
> > >> On Thu, Mar 27, 2014 at 2:50 PM, Mikael Simberg
> <[email protected]>wrote:
> > >>> Hi,
> > >>> I'm having some trouble figuring out exactly how I'm supposed to use
> > >>> SharedArrays - I might just be misunderstanding them or else
> something
> > >>> odd is happening with them.
> > >>>
> > >>> I'm trying to do some parallel computing which looks a bit like this
> > >>> test case:
> > >>>
> > >>> function createdata(shared)
> > >>>
> > >>>     const n = 1000
> > >>>     if shared
> > >>>
> > >>>         A = SharedArray(Uint, (n, n))
> > >>>
> > >>>     else
> > >>>
> > >>>         A = Array(Uint, (n, n))
> > >>>
> > >>>     end
> > >>>     for i = 1:n, j = 1:n
> > >>>
> > >>>         A[i, j] = rand(Uint)
> > >>>
> > >>>     end
> > >>>
> > >>>     return n, A
> > >>>
> > >>> end
> > >>>
> > >>> function mainfunction(r; shared = false)
> > >>>
> > >>>     n, A = createdata(shared)
> > >>>
> > >>>     i = 1
> > >>>     nextidx() = (idx = i; i += 1; idx)
> > >>>
> > >>>     @sync begin
> > >>>
> > >>>         for p in workers()
> > >>>
> > >>>             @async begin
> > >>>
> > >>>                 while true
> > >>>
> > >>>                     idx = nextidx()
> > >>>                     if idx > r
> > >>>
> > >>>                         break
> > >>>
> > >>>                     end
> > >>>                     found, s = remotecall_fetch(p, parfunction, n, A)
> > >>>
> > >>>                 end
> > >>>
> > >>>             end
> > >>>
> > >>>         end
> > >>>
> > >>>     end
> > >>>
> > >>> end
> > >>>
> > >>> function parfunction(n::Int, A::Array{Uint, 2})
> > >>>
> > >>>     # possibly do some other computation here independent of shared
> > >>>     arrays
> > >>>     s = sum(A)
> > >>>     return false, s
> > >>>
> > >>> end
> > >>>
> > >>> function parfunction(n::Int, A::SharedArray{Uint, 2})
> > >>>
> > >>>     s = sum(A)
> > >>>     return false, s
> > >>>
> > >>> end
> > >>>
> > >>> If I then start julia with e.g. two worker processes, so julia -p 2,
> the
> > >>> following happens:
> > >>>
> > >>> julia> require("testpar.jl")
> > >>>
> > >>> julia> @time mainfunction(1000, shared = false)
> > >>> elapsed time: 15.717117365 seconds (8448701068 bytes allocated)
> > >>>
> > >>> julia> @time mainfunction(1000, shared = true)
> > >>> elapsed time: 6.068758627 seconds (56713996 bytes allocated)
> > >>>
> > >>> julia> rmprocs([2, 3])
> > >>>
> > >>> :ok
> > >>>
> > >>> julia> @time mainfunction(1000, shared = false)
> > >>> elapsed time: 0.717638344 seconds (40357664 bytes allocated)
> > >>>
> > >>> julia> @time mainfunction(1000, shared = true)
> > >>> elapsed time: 0.702174085 seconds (32680628 bytes allocated)
> > >>>
> > >>> So, with a normal array it's slow as expected, and it is faster with
> the
> > >>> shared array, but what seems to happen is that with the normal array
> cpu
> > >>> usage is 100 % on two cores but with the shared array cpu usage
> spikes
> > >>> for a fraction of a second and then for the remaining nearly 6
> seconds
> > >>> it's at around 10 %. Can anyone reproduce this? Am I just doing
> > >>> something wrong with shared arrays.
> > >>>
> > >>> Slightly related note: is there now a way to create a random shared
> > >>> array? https://github.com/JuliaLang/julia/pull/4939 and the latest
> docs
> > >>> don't mention this.
>

Reply via email to