Hi Mikael,

This seems to be a bug in the SharedArray constructor. For sharedarrays of
length less than the number of participating pids only the first few pids
are used. Since the length of s = SharedArray(Uint, (1)) is 1, it is mapped
only on the first process.

For now a workaround is to just create s = SharedArray(Uint, (10)) or
something and just use the first element.



On Thu, Mar 27, 2014 at 7:13 PM, Mikael Simberg <[email protected]>wrote:

>  Yes, you're at least half-right about it not doing quite what I want. Or
> let's say I was expecting the majority of the overhead to come from having
> to send the array over to each process, but what I wasn't expecting was
> that getting a boolean and an integer back would take so much time (and
> thus I was expecting using a SharedArray would have been at least
> comparable to keeping everything local). Indeed, if I just do a remotecall
> (i.e. without the fetch) it is faster with multiple processes which is what
> I was expecting.
>
> What I essentially want to do in the end is that the parfunction() is
> successful with some probability and then I want to return some object from
> the calculations there, but in general I will not want to fetch anything.
> What would be the "correct" way to do that? If I have the following code:
>
> function mainfunction(r)
>
>     const n = 1000
>     A = SharedArray(Uint, (n, n))
>     for i = 1:n, j = 1:n
>         A[i, j] = rand(Uint)
>     end
>     s = SharedArray(Uint, (1))
>
>     i = 1
>     nextidx() = (idx = i; i += 1; idx)
>
>     println(s)
>     @sync begin
>         for p in workers()
>             @async begin
>                 while true
>                     idx = nextidx()
>                     if idx > r
>                         break
>                     end
>                     remotecall(p, parfunction, A, s)
>                 end
>             end
>         end
>     end
>     println(s)
> end
>
> function parfunction(A::SharedArray{Uint, 2}, s::SharedArray{Uint, 1})
>     d = sum(A)
>     if rand(0:1000) == 0
>         println("success")
>         s[1] = d
>     end
> end
>
> and run
> julia -p 2
> julia> reload("testpar.jl")
> julia> @time mainfunction(5000)
>
> I get ERROR: SharedArray cannot be used on a non-participating process,
> although s should according to my logic be available on all processes (I'm
> assuming it's s that's causing it because it's fine if I remove all traces
> of s).
>
> On Thu, Mar 27, 2014, at 4:54, Amit Murthy wrote:
>
> I think the code does not do what you want.
>
> In the non-shared case you are sending a 10^6 integer array over the
> network 1000 times and summing it as many times. Most of the time is the
> network traffic time. Reduce 'n' to say 10, and you will what I mean
>
> In the shared case you are not sending the array over the network but
> still summing the entire array 1000 times. Some of the remotecall_fetch
> calls seems to be taking 40 milli seconds extra time which adds to the
> total.
>
> shared time of 6 seconds being less than the 15 seconds for non-shared
> seems to be just incidental.
>
> I don't yet have an explanation for the extra 40 millseconds per
> remotecall_fetch (for some calls only) in the shared case.
>
>
>
>
>
>
> On Thu, Mar 27, 2014 at 2:50 PM, Mikael Simberg <[email protected]>wrote:
>
> Hi,
>  I'm having some trouble figuring out exactly how I'm supposed to use
>  SharedArrays - I might just be misunderstanding them or else something
>  odd is happening with them.
>
>  I'm trying to do some parallel computing which looks a bit like this
>  test case:
>
>  function createdata(shared)
>      const n = 1000
>      if shared
>          A = SharedArray(Uint, (n, n))
>      else
>          A = Array(Uint, (n, n))
>      end
>      for i = 1:n, j = 1:n
>          A[i, j] = rand(Uint)
>      end
>
>      return n, A
>  end
>
>  function mainfunction(r; shared = false)
>      n, A = createdata(shared)
>
>      i = 1
>      nextidx() = (idx = i; i += 1; idx)
>
>      @sync begin
>          for p in workers()
>              @async begin
>                  while true
>                      idx = nextidx()
>                      if idx > r
>                          break
>                      end
>                      found, s = remotecall_fetch(p, parfunction, n, A)
>                  end
>              end
>          end
>      end
>  end
>
>  function parfunction(n::Int, A::Array{Uint, 2})
>      # possibly do some other computation here independent of shared
>      arrays
>      s = sum(A)
>      return false, s
>  end
>
>  function parfunction(n::Int, A::SharedArray{Uint, 2})
>      s = sum(A)
>      return false, s
>  end
>
>  If I then start julia with e.g. two worker processes, so julia -p 2, the
>  following happens:
>
>  julia> require("testpar.jl")
>
>  julia> @time mainfunction(1000, shared = false)
>  elapsed time: 15.717117365 seconds (8448701068 bytes allocated)
>
>  julia> @time mainfunction(1000, shared = true)
>  elapsed time: 6.068758627 seconds (56713996 bytes allocated)
>
>  julia> rmprocs([2, 3])
>  :ok
>
>  julia> @time mainfunction(1000, shared = false)
>  elapsed time: 0.717638344 seconds (40357664 bytes allocated)
>
>  julia> @time mainfunction(1000, shared = true)
>  elapsed time: 0.702174085 seconds (32680628 bytes allocated)
>
>  So, with a normal array it's slow as expected, and it is faster with the
>  shared array, but what seems to happen is that with the normal array cpu
>  usage is 100 % on two cores but with the shared array cpu usage spikes
>  for a fraction of a second and then for the remaining nearly 6 seconds
>  it's at around 10 %. Can anyone reproduce this? Am I just doing
>  something wrong with shared arrays.
>
>  Slightly related note: is there now a way to create a random shared
>  array? https://github.com/JuliaLang/julia/pull/4939 and the latest docs
>  don't mention this.
>
>
>
>
>

Reply via email to