I'm trying 4 procs and 300x300 dense matrices. I'm no used to git, so I put
the code here:
function cannon_par(a,b) # for square matrices, nworkers() must be set
s = size(a,1)
nblocks = nworkers() # number of procs
size_B = int(sqrt(nblocks)) # size of a,b w.r.t. the blocks
if ~ isinteger(size_B)
error("nworkers() must be a perfect square")
end
bs = s/size_B # block size = bs x bs
if ~ isinteger(bs)
error("Argument matrices can not be divided equally among
nworkers()")
end
A = cell(s,s)
B = cell(s,s)
C = zeros(s,s)
I(i) = (i-1)*bs+1:i*bs # function for block indexing, block A_ij =
A[I(i),A(j)]
#### initial shifting ####
for i = 1:size_B
#shift the ith block of a by i-1 horizontally
A[I(i),:] = circshift(a[I(i),:],[0 bs*(1-i)])
#shift the ith block of b by i-1 vertically
B[:,I(i)] = circshift(b[:,I(i)], bs*(1-i))
end
#### A and B are distributed ####
dA = distribute(A)
dB = distribute(B)
#### Cannon iterations ####
for k = 1:size_B
#tic()
C_local = pmap(fetch, {@spawnat p localpart(dA)*localpart(dB)
for p in procs(dA)})
#toc()
#tic()
for i = 1:size_B
for j = 1:size_B
C[I(i),I(j)] += C_local[(j-1)*size_B+i]
end
end
#toc()
if k < size_B
#tic()
A = circshift(A,[0 -bs]); # shifted
B = circshift(B,-bs);
dA = distribute(A); # and distributed again
dB = distribute(B);
#toc()
end
end
C
end
Il giorno domenica 22 giugno 2014 16:54:01 UTC+2, Viral Shah ha scritto:
>
> The communication is probably happening in other parts of the code. How
> large a problem are you trying? Can you post the full code in a gist or a
> git repository? I will try it out. This is a good example to have in our
> manual as well, and I just haven't got around to it.
>
> -viral
>
> On Sunday, June 22, 2014 4:53:02 PM UTC+5:30, Pietro Benedusi wrote:
>>
>> Yes, I'm using the function distribute(). This is the hotspot of my code
>> (C = A*B)
>>
>> C_local = pmap(fetch, {@spawnat p
>> localpart(dA)*localpart(dB) for p in procs(dA)})
>>
>>
>> Is it the right way to procede? In this way the multiplication is very
>> slow ( I'm using 4 workers).
>>
>> Many thanks for helping.
>>
>>
>>
>> Il giorno domenica 22 giugno 2014 07:14:52 UTC+2, Viral Shah ha scritto:
>>>
>>> Are you using DArrays? You should be able to move data with indexing.
>>> For the Cannon algorithm, you should be able to organize your communication
>>> so that each processor moves the data it needs - IIRC.
>>>
>>> -viral
>>>
>>> On Saturday, June 21, 2014 11:08:06 PM UTC+5:30, Pietro Benedusi wrote:
>>>>
>>>> Hello,
>>>>
>>>> I need to write a distributed Cannon algorithm for matrix
>>>> multiplication.
>>>> In every iteration I have to shift all the blocks of the involved
>>>> matrices or equivalently to move blocks between remote procs. How can I
>>>> move blocks from a remote proc to an other?
>>>>
>>>> Thnaks
>>>>
>>>