Yes, you're at least half-right about it not doing quite what I want.
Or let's say I was expecting the majority of the overhead to come from
having to send the array over to each process, but what I wasn't
expecting was that getting a boolean and an integer back would take so
much time (and thus I was expecting using a SharedArray would have been
at least comparable to keeping everything local). Indeed, if I just do
a remotecall (i.e. without the fetch) it is faster with multiple
processes which is what I was expecting.
What I essentially want to do in the end is that the parfunction() is
successful with some probability and then I want to return some object
from the calculations there, but in general I will not want to fetch
anything. What would be the "correct" way to do that? If I have the
following code:
function mainfunction(r)
const n = 1000
A = SharedArray(Uint, (n, n))
for i = 1:n, j = 1:n
A[i, j] = rand(Uint)
end
s = SharedArray(Uint, (1))
i = 1
nextidx() = (idx = i; i += 1; idx)
println(s)
@sync begin
for p in workers()
@async begin
while true
idx = nextidx()
if idx > r
break
end
remotecall(p, parfunction, A, s)
end
end
end
end
println(s)
end
function parfunction(A::SharedArray{Uint, 2}, s::SharedArray{Uint, 1})
d = sum(A)
if rand(0:1000) == 0
println("success")
s[1] = d
end
end
and run
julia -p 2
julia> reload("testpar.jl")
julia> @time mainfunction(5000)
I get ERROR: SharedArray cannot be used on a non-participating process,
although s should according to my logic be available on all processes
(I'm assuming it's s that's causing it because it's fine if I remove
all traces of s).
On Thu, Mar 27, 2014, at 4:54, Amit Murthy wrote:
I think the code does not do what you want.
In the non-shared case you are sending a 10^6 integer array over the
network 1000 times and summing it as many times. Most of the time is
the network traffic time. Reduce 'n' to say 10, and you will what I
mean
In the shared case you are not sending the array over the network but
still summing the entire array 1000 times. Some of the remotecall_fetch
calls seems to be taking 40 milli seconds extra time which adds to the
total.
shared time of 6 seconds being less than the 15 seconds for non-shared
seems to be just incidental.
I don't yet have an explanation for the extra 40 millseconds per
remotecall_fetch (for some calls only) in the shared case.
On Thu, Mar 27, 2014 at 2:50 PM, Mikael Simberg
<[1][email protected]> wrote:
Hi,
I'm having some trouble figuring out exactly how I'm supposed to use
SharedArrays - I might just be misunderstanding them or else something
odd is happening with them.
I'm trying to do some parallel computing which looks a bit like this
test case:
function createdata(shared)
const n = 1000
if shared
A = SharedArray(Uint, (n, n))
else
A = Array(Uint, (n, n))
end
for i = 1:n, j = 1:n
A[i, j] = rand(Uint)
end
return n, A
end
function mainfunction(r; shared = false)
n, A = createdata(shared)
i = 1
nextidx() = (idx = i; i += 1; idx)
@sync begin
for p in workers()
@async begin
while true
idx = nextidx()
if idx > r
break
end
found, s = remotecall_fetch(p, parfunction, n, A)
end
end
end
end
end
function parfunction(n::Int, A::Array{Uint, 2})
# possibly do some other computation here independent of shared
arrays
s = sum(A)
return false, s
end
function parfunction(n::Int, A::SharedArray{Uint, 2})
s = sum(A)
return false, s
end
If I then start julia with e.g. two worker processes, so julia -p 2,
the
following happens:
julia> require("testpar.jl")
julia> @time mainfunction(1000, shared = false)
elapsed time: 15.717117365 seconds (8448701068 bytes allocated)
julia> @time mainfunction(1000, shared = true)
elapsed time: 6.068758627 seconds (56713996 bytes allocated)
julia> rmprocs([2, 3])
:ok
julia> @time mainfunction(1000, shared = false)
elapsed time: 0.717638344 seconds (40357664 bytes allocated)
julia> @time mainfunction(1000, shared = true)
elapsed time: 0.702174085 seconds (32680628 bytes allocated)
So, with a normal array it's slow as expected, and it is faster with
the
shared array, but what seems to happen is that with the normal array
cpu
usage is 100 % on two cores but with the shared array cpu usage spikes
for a fraction of a second and then for the remaining nearly 6 seconds
it's at around 10 %. Can anyone reproduce this? Am I just doing
something wrong with shared arrays.
Slightly related note: is there now a way to create a random shared
array? [2]https://github.com/JuliaLang/julia/pull/4939 and the latest
docs
don't mention this.
References
1. mailto:[email protected]
2. https://github.com/JuliaLang/julia/pull/4939