Hi Mikael, This seems to be a bug in the SharedArray constructor. For sharedarrays of length less than the number of participating pids only the first few pids are used. Since the length of s = SharedArray(Uint, (1)) is 1, it is mapped only on the first process.
For now a workaround is to just create s = SharedArray(Uint, (10)) or something and just use the first element. On Thu, Mar 27, 2014 at 7:13 PM, Mikael Simberg <[email protected]>wrote: > Yes, you're at least half-right about it not doing quite what I want. Or > let's say I was expecting the majority of the overhead to come from having > to send the array over to each process, but what I wasn't expecting was > that getting a boolean and an integer back would take so much time (and > thus I was expecting using a SharedArray would have been at least > comparable to keeping everything local). Indeed, if I just do a remotecall > (i.e. without the fetch) it is faster with multiple processes which is what > I was expecting. > > What I essentially want to do in the end is that the parfunction() is > successful with some probability and then I want to return some object from > the calculations there, but in general I will not want to fetch anything. > What would be the "correct" way to do that? If I have the following code: > > function mainfunction(r) > > const n = 1000 > A = SharedArray(Uint, (n, n)) > for i = 1:n, j = 1:n > A[i, j] = rand(Uint) > end > s = SharedArray(Uint, (1)) > > i = 1 > nextidx() = (idx = i; i += 1; idx) > > println(s) > @sync begin > for p in workers() > @async begin > while true > idx = nextidx() > if idx > r > break > end > remotecall(p, parfunction, A, s) > end > end > end > end > println(s) > end > > function parfunction(A::SharedArray{Uint, 2}, s::SharedArray{Uint, 1}) > d = sum(A) > if rand(0:1000) == 0 > println("success") > s[1] = d > end > end > > and run > julia -p 2 > julia> reload("testpar.jl") > julia> @time mainfunction(5000) > > I get ERROR: SharedArray cannot be used on a non-participating process, > although s should according to my logic be available on all processes (I'm > assuming it's s that's causing it because it's fine if I remove all traces > of s). > > On Thu, Mar 27, 2014, at 4:54, Amit Murthy wrote: > > I think the code does not do what you want. > > In the non-shared case you are sending a 10^6 integer array over the > network 1000 times and summing it as many times. Most of the time is the > network traffic time. Reduce 'n' to say 10, and you will what I mean > > In the shared case you are not sending the array over the network but > still summing the entire array 1000 times. Some of the remotecall_fetch > calls seems to be taking 40 milli seconds extra time which adds to the > total. > > shared time of 6 seconds being less than the 15 seconds for non-shared > seems to be just incidental. > > I don't yet have an explanation for the extra 40 millseconds per > remotecall_fetch (for some calls only) in the shared case. > > > > > > > On Thu, Mar 27, 2014 at 2:50 PM, Mikael Simberg <[email protected]>wrote: > > Hi, > I'm having some trouble figuring out exactly how I'm supposed to use > SharedArrays - I might just be misunderstanding them or else something > odd is happening with them. > > I'm trying to do some parallel computing which looks a bit like this > test case: > > function createdata(shared) > const n = 1000 > if shared > A = SharedArray(Uint, (n, n)) > else > A = Array(Uint, (n, n)) > end > for i = 1:n, j = 1:n > A[i, j] = rand(Uint) > end > > return n, A > end > > function mainfunction(r; shared = false) > n, A = createdata(shared) > > i = 1 > nextidx() = (idx = i; i += 1; idx) > > @sync begin > for p in workers() > @async begin > while true > idx = nextidx() > if idx > r > break > end > found, s = remotecall_fetch(p, parfunction, n, A) > end > end > end > end > end > > function parfunction(n::Int, A::Array{Uint, 2}) > # possibly do some other computation here independent of shared > arrays > s = sum(A) > return false, s > end > > function parfunction(n::Int, A::SharedArray{Uint, 2}) > s = sum(A) > return false, s > end > > If I then start julia with e.g. two worker processes, so julia -p 2, the > following happens: > > julia> require("testpar.jl") > > julia> @time mainfunction(1000, shared = false) > elapsed time: 15.717117365 seconds (8448701068 bytes allocated) > > julia> @time mainfunction(1000, shared = true) > elapsed time: 6.068758627 seconds (56713996 bytes allocated) > > julia> rmprocs([2, 3]) > :ok > > julia> @time mainfunction(1000, shared = false) > elapsed time: 0.717638344 seconds (40357664 bytes allocated) > > julia> @time mainfunction(1000, shared = true) > elapsed time: 0.702174085 seconds (32680628 bytes allocated) > > So, with a normal array it's slow as expected, and it is faster with the > shared array, but what seems to happen is that with the normal array cpu > usage is 100 % on two cores but with the shared array cpu usage spikes > for a fraction of a second and then for the remaining nearly 6 seconds > it's at around 10 %. Can anyone reproduce this? Am I just doing > something wrong with shared arrays. > > Slightly related note: is there now a way to create a random shared > array? https://github.com/JuliaLang/julia/pull/4939 and the latest docs > don't mention this. > > > > >
