Mikael, any type of parallelism has implicit overhead. You are not doing
nearly enough work to amortize this overhead.
On Thursday, March 27, 2014 2:31:14 PM UTC-4, Mikael Simberg wrote:
>
> All right, thanks! That gets rid of the error.
>
> If you don't mind though, I'd have some more questions because I'm still
> doing something wrong or my expectations are too high. So I have again
> similar code as before:
>
> function mainfunction(r; single = false)
>
> const n = 1000
> A = SharedArray(Uint, (n, n))
> for i = 1:n, j = 1:n
> A[i, j] = rand(Uint)
> end
> s = SharedArray(Uint, (nprocs()))
>
> i = 1
> nextidx() = (idx = i; i += 1; idx)
>
> if single
> for i = 1:r
> parfunction(A, s)
> end
> else
> @sync begin
> for p in workers()
> @async begin
> while true
> idx = nextidx()
> if idx > r
> break
> end
> @spawn parfunction(A, s)
> end
> end
> end
> end
> end
> end
>
> function parfunction(A::SharedArray{Uint, 2}, s::SharedArray{Uint, 1})
> d = sum(A)
> if rand(0:1000) == 0
> s[1] = d
> end
> end
>
> I start julia with -p 4 and run the function once first and then:
>
> julia> @time mainfunction(10000, single = false)
> elapsed time: 8.762498191 seconds (403413200 bytes allocated)
>
> julia> @time mainfunction(10000, single = false)
> elapsed time: 7.433603658 seconds (477360360 bytes allocated)
>
> julia> @time mainfunction(10000, single = false)
> elapsed time: 7.379368673 seconds (477509296 bytes allocated)
>
> julia> rmprocs([2, 3, 4, 5])
> :ok
>
> julia> @time mainfunction(10000, single = false)
> elapsed time: 7.3204546 seconds (56925308 bytes allocated)
>
> julia> @time mainfunction(10000, single = false)
> elapsed time: 10.21160855 seconds (45163964 bytes allocated)
>
> julia> @time mainfunction(10000, single = false)
> elapsed time: 10.133408252 seconds (44785608 bytes allocated)
>
> julia> @time mainfunction(10000, single = true)
>
> elapsed time: 9.117599749 seconds (23997456 bytes allocated)
>
> julia> @time mainfunction(10000, single = true)
> elapsed time: 6.193505021 seconds (23997488 bytes allocated)
>
> julia> @time mainfunction(10000, single = true)
> elapsed time: 6.189335567 seconds (23997552 bytes allocated)
>
> So besides the times varying quite a lot, is this as good as I can hope
> for with a function like mine, i.e. in practice there is no speedup? Also,
> is what I have above with a separate for loop that just runs on a single
> process the best way to handle the situation where I know I have only a
> single process (obviously I know how many I have with nworkers, but I don't
> know if for example pmap is slower than normal map on just a single
> process?).
>
>
>
> On Thu, Mar 27, 2014, at 7:06, Amit Murthy wrote:
>
> Hi Mikael,
>
> This seems to be a bug in the SharedArray constructor. For sharedarrays of
> length less than the number of participating pids only the first few pids
> are used. Since the length of s = SharedArray(Uint, (1)) is 1, it is
> mapped only on the first process.
>
> For now a workaround is to just create s = SharedArray(Uint, (10)) or
> something and just use the first element.
>
>
>
> On Thu, Mar 27, 2014 at 7:13 PM, Mikael Simberg
> <[email protected]<javascript:>
> > wrote:
>
>
> Yes, you're at least half-right about it not doing quite what I want. Or
> let's say I was expecting the majority of the overhead to come from having
> to send the array over to each process, but what I wasn't expecting was
> that getting a boolean and an integer back would take so much time (and
> thus I was expecting using a SharedArray would have been at least
> comparable to keeping everything local). Indeed, if I just do a remotecall
> (i.e. without the fetch) it is faster with multiple processes which is what
> I was expecting.
>
> What I essentially want to do in the end is that the parfunction() is
> successful with some probability and then I want to return some object from
> the calculations there, but in general I will not want to fetch anything.
> What would be the "correct" way to do that? If I have the following code:
>
> function mainfunction(r)
>
> const n = 1000
> A = SharedArray(Uint, (n, n))
> for i = 1:n, j = 1:n
> A[i, j] = rand(Uint)
> end
> s = SharedArray(Uint, (1))
>
> i = 1
> nextidx() = (idx = i; i += 1; idx)
>
> println(s)
> @sync begin
> for p in workers()
> @async begin
> while true
> idx = nextidx()
> if idx > r
> break
> end
> remotecall(p, parfunction, A, s)
> end
> end
> end
> end
> println(s)
> end
>
> function parfunction(A::SharedArray{Uint, 2}, s::SharedArray{Uint, 1})
> d = sum(A)
> if rand(0:1000) == 0
> println("success")
> s[1] = d
> end
> end
>
> and run
> julia -p 2
> julia> reload("testpar.jl")
> julia> @time mainfunction(5000)
>
> I get ERROR: SharedArray cannot be used on a non-participating process,
> although s should according to my logic be available on all processes (I'm
> assuming it's s that's causing it because it's fine if I remove all traces
> of s).
>
> On Thu, Mar 27, 2014, at 4:54, Amit Murthy wrote:
>
> I think the code does not do what you want.
>
> In the non-shared case you are sending a 10^6 integer array over the
> network 1000 times and summing it as many times. Most of the time is the
> network traffic time. Reduce 'n' to say 10, and you will what I mean
>
> In the shared case you are not sending the array over the network but
> still summing the entire array 1000 times. Some of the remotecall_fetch
> calls seems to be taking 40 milli seconds extra time which adds to the
> total.
>
> shared time of 6 seconds being less than the 15 seconds for non-shared
> seems to be just incidental.
>
> I don't yet have an explanation for the extra 40 millseconds per
> remotecall_fetch (for some calls only) in the shared case.
>
>
>
>
>
>
> On Thu, Mar 27, 2014 at 2:50 PM, Mikael Simberg
> <[email protected]<javascript:>
> > wrote:
>
> Hi,
> I'm having some trouble figuring out exactly how I'm supposed to use
> SharedArrays - I might just be misunderstanding them or else something
> odd is happening with them.
>
> I'm trying to do some parallel computing which looks a bit like this
> test case:
>
> function createdata(shared)
> const n = 1000
> if shared
> A = SharedArray(Uint, (n, n))
> else
> A = Array(Uint, (n, n))
> end
> for i = 1:n, j = 1:n
> A[i, j] = rand(Uint)
> end
>
> return n, A
> end
>
> function mainfunction(r; shared = false)
> n, A = createdata(shared)
>
> i = 1
> nextidx() = (idx = i; i += 1; idx)
>
> @sync begin
> for p in workers()
> @async begin
> while true
> idx = nextidx()
> if idx > r
> break
> end
> found, s = remotecall_fetch(p, parfunction, n, A)
> end
> end
> end
> end
> end
>
> function parfunction(n::Int, A::Array{Uint, 2})
> # possibly do some other computation here independent of shared
> arrays
> s = sum(A)
> return false, s
> end
>
> function parfunction(n::Int, A::SharedArray{Uint, 2})
> s = sum(A)
> return false, s
> end
>
> If I then start julia with e.g. two worker processes, so julia -p 2, the
> following happens:
>
> julia> require("testpar.jl")
>
> julia> @time mainfunction(1000, shared = false)
> elapsed time: 15.717117365 seconds (8448701068 bytes allocated)
>
> julia> @time mainfunction(1000, shared = true)
> elapsed time: 6.068758627 seconds (56713996 bytes allocated)
>
> julia> rmprocs([2, 3])
> :ok
>
> julia> @time mainfunction(1000, shared = false)
> elapsed time: 0.717638344 seconds (40357664 bytes allocated)
>
> julia> @time mainfunction(1000, shared = true)
> elapsed time: 0.702174085 seconds (32680628 bytes allocated)
>
> So, with a normal array it's slow as expected, and it is faster with the
> shared array, but what seems to happen is that with the normal array cpu
> usage is 100 % on two cores but with the shared array cpu usage spikes
> for a fraction of a second and then for the remaining nearly 6 seconds
> it's at around 10 %. Can anyone reproduce this? Am I just doing
> something wrong with shared arrays.
>
> Slightly related note: is there now a way to create a random shared
> array? https://github.com/JuliaLang/julia/pull/4939 and the latest docs
> don't mention this.
>
>
>
>
>
>
>