No explanation for the uneven distribution of the 40 milliseconds though.

On Thu, Mar 27, 2014 at 6:11 PM, Amit Murthy <[email protected]> wrote:

> There is a pattern here. For a set of pids, the cumulative sum is 40
> milliseconds. In a SharedArray, RemoteRefs are maintained on the creating
> pid (in this case 1) to the shmem mappings on each of the workers. I think
> they are referring back to pid 1 to fetch the local mapping when the shared
> array object is passed in the remotecall_fetch call, and hence all the
> workers are stuck on pid 1 becoming free to service these calls.
>
>
> On Thu, Mar 27, 2014 at 5:58 PM, Amit Murthy <[email protected]>wrote:
>
>> Some more weirdness
>>
>> Starting with julia -p 8
>>
>> A=Base.shmem_fill(1, (1000,1000))
>>
>> Using 2 workers:
>> for i in 1:100
>>          t1 = time(); p=2+(i%2); remotecall_fetch(p, x->1, A); t2=time();
>> println("@ $p ", int((t2-t1) * 1000))
>> end
>>
>> prints
>>
>> ...
>> @ 3 8
>> @ 2 32
>> @ 3 8
>> @ 2 32
>> @ 3 8
>> @ 2 32
>> @ 3 8
>> @ 2 32
>>
>>
>> Notice that pid 2 always takes 32 milliseconds while pid 3 always takes 8
>>
>>
>>
>> With 4 workers:
>>
>> for i in 1:100
>>          t1 = time(); p=2+(i%4); remotecall_fetch(p, x->1, A); t2=time();
>> println("@ $p ", int((t2-t1) * 1000))
>> end
>>
>> ...
>> @ 2 31
>> @ 3 4
>> @ 4 4
>> @ 5 1
>> @ 2 31
>> @ 3 4
>> @ 4 4
>> @ 5 1
>> @ 2 31
>> @ 3 4
>> @ 4 4
>> @ 5 1
>> @ 2 31
>>
>>
>> Now pid 2 always takes 31 millisecs, pids 3&4, 4 and pid 5 1 millisecond
>>
>> With 8 workers:
>>
>> for i in 1:100
>>          t1 = time(); p=2+(i%8); remotecall_fetch(p, x->1, A); t2=time();
>> println("@ $p ", int((t2-t1) * 1000))
>> end
>>
>> ....
>> @ 2 20
>> @ 3 4
>> @ 4 1
>> @ 5 3
>> @ 6 4
>> @ 7 1
>> @ 8 2
>> @ 9 4
>> @ 2 20
>> @ 3 4
>> @ 4 1
>> @ 5 3
>> @ 6 4
>> @ 7 1
>> @ 8 2
>> @ 9 4
>> @ 2 20
>> @ 3 4
>> @ 4 1
>> @ 5 3
>> @ 6 4
>> @ 7 1
>> @ 8 3
>> @ 9 4
>> @ 2 20
>> @ 3 4
>> @ 4 1
>> @ 5 3
>> @ 6 4
>>
>>
>> pid 2 is always 20 milliseconds while the rest are pretty consistent too.
>>
>> Any explanations?
>>
>>
>>
>>
>>
>>
>>
>> On Thu, Mar 27, 2014 at 5:24 PM, Amit Murthy <[email protected]>wrote:
>>
>>> I think the code does not do what you want.
>>>
>>> In the non-shared case you are sending a 10^6 integer array over the
>>> network 1000 times and summing it as many times. Most of the time is the
>>> network traffic time. Reduce 'n' to say 10, and you will what I mean
>>>
>>> In the shared case you are not sending the array over the network but
>>> still summing the entire array 1000 times. Some of the remotecall_fetch
>>> calls seems to be taking 40 milli seconds extra time which adds to the
>>> total.
>>>
>>> shared time of 6 seconds being less than the 15 seconds for non-shared
>>> seems to be just incidental.
>>>
>>> I don't yet have an explanation for the extra 40 millseconds per
>>> remotecall_fetch (for some calls only) in the shared case.
>>>
>>>
>>>
>>>
>>>
>>>
>>> On Thu, Mar 27, 2014 at 2:50 PM, Mikael Simberg 
>>> <[email protected]>wrote:
>>>
>>>> Hi,
>>>> I'm having some trouble figuring out exactly how I'm supposed to use
>>>> SharedArrays - I might just be misunderstanding them or else something
>>>> odd is happening with them.
>>>>
>>>> I'm trying to do some parallel computing which looks a bit like this
>>>> test case:
>>>>
>>>> function createdata(shared)
>>>>     const n = 1000
>>>>     if shared
>>>>         A = SharedArray(Uint, (n, n))
>>>>     else
>>>>         A = Array(Uint, (n, n))
>>>>     end
>>>>     for i = 1:n, j = 1:n
>>>>         A[i, j] = rand(Uint)
>>>>     end
>>>>
>>>>     return n, A
>>>> end
>>>>
>>>> function mainfunction(r; shared = false)
>>>>     n, A = createdata(shared)
>>>>
>>>>     i = 1
>>>>     nextidx() = (idx = i; i += 1; idx)
>>>>
>>>>     @sync begin
>>>>         for p in workers()
>>>>             @async begin
>>>>                 while true
>>>>                     idx = nextidx()
>>>>                     if idx > r
>>>>                         break
>>>>                     end
>>>>                     found, s = remotecall_fetch(p, parfunction, n, A)
>>>>                 end
>>>>             end
>>>>         end
>>>>     end
>>>> end
>>>>
>>>> function parfunction(n::Int, A::Array{Uint, 2})
>>>>     # possibly do some other computation here independent of shared
>>>>     arrays
>>>>     s = sum(A)
>>>>     return false, s
>>>> end
>>>>
>>>> function parfunction(n::Int, A::SharedArray{Uint, 2})
>>>>     s = sum(A)
>>>>     return false, s
>>>> end
>>>>
>>>> If I then start julia with e.g. two worker processes, so julia -p 2, the
>>>> following happens:
>>>>
>>>> julia> require("testpar.jl")
>>>>
>>>> julia> @time mainfunction(1000, shared = false)
>>>> elapsed time: 15.717117365 seconds (8448701068 bytes allocated)
>>>>
>>>> julia> @time mainfunction(1000, shared = true)
>>>> elapsed time: 6.068758627 seconds (56713996 bytes allocated)
>>>>
>>>> julia> rmprocs([2, 3])
>>>> :ok
>>>>
>>>> julia> @time mainfunction(1000, shared = false)
>>>> elapsed time: 0.717638344 seconds (40357664 bytes allocated)
>>>>
>>>> julia> @time mainfunction(1000, shared = true)
>>>> elapsed time: 0.702174085 seconds (32680628 bytes allocated)
>>>>
>>>> So, with a normal array it's slow as expected, and it is faster with the
>>>> shared array, but what seems to happen is that with the normal array cpu
>>>> usage is 100 % on two cores but with the shared array cpu usage spikes
>>>> for a fraction of a second and then for the remaining nearly 6 seconds
>>>> it's at around 10 %. Can anyone reproduce this? Am I just doing
>>>> something wrong with shared arrays.
>>>>
>>>> Slightly related note: is there now a way to create a random shared
>>>> array? https://github.com/JuliaLang/julia/pull/4939 and the latest docs
>>>> don't mention this.
>>>>
>>>
>>>
>>
>

Reply via email to