For dense matrix multiplication, the key problem is that you have O(n^3)
arithmetic operations and O(n^2) element fetches.  Most conventional
machines now have nearly 10^2 or larger ratio between the speed of the
arithmetic processor and memory so for n > 100, you should be able to
saturate the arithmetic unit of the processor with carefully written code.

Unfortunately, naively written code changes matters to require O(n^3) memory
operations with an attendant slowdown of multiple orders of magnitude.

This problem can become vastly worse on multi-processors and when you start
talking about networked clusters, then things are worse still because the
communication bandwidth is so limited.

The key therefore, is to use data elements as many times as possible.

If you use row and column decomposition and dot products as you suggest,
there is some potential for re-use, but it can be difficult.  The basic
intuition is that you have to multiply each row of the left hand matrix
against every column.  This gives you reuse of the left hand matrix
elements, but it increases the number of copies of the right hand matrix
that you have to send around which defeats the gain for the left hand
matrix.

An easier way to actually get the gain is to use block decomposition of the
input matrices.  If the input blocks are sufficiently large enough then you
will get your re-use just by using an efficient machine level multiply on
each node.  The blocks can be nearly square or can be strips, but blocks are
probably a bit easier to get full speed on.

A more serious question is whether the inherent n^2 nature of the problem
will defeat you before you really run out of parallel processing steam.  Are
you sure that you need dense matrices?

If it turns out that you don't need dense matrices then you have a very
different kind of problem.  The same sort of block decomposition strategy
that applied for dense problems can be applied to sparse problems, but you
have to be careful to balance the number of non-zero elements in the blocks.
For many information retrieval and social algorithm problems, the data is
subject to something like Zipf's law which can mean that you have very
serious non-uniformity in the distribution of non-zero elements.  The trick
for dealing with this is often to do the block decomposition of a random
permutation of the matrices which avoids the non-uniformity.

You should also remember that text retrieval engines are essentially sparse
matrix multiplication engines.  You may be able to "cheat" a bit by using an
engine such as Lucene with custom scoring functions or you might gain some
benefit or inspiration from the code.  Retrieval engines are particularly
useful when it comes to the maintenance of the large matrices since that
becomes particularly difficult with scale (probably harder than doing the
multiplication in fact).

I would love to hear more of what you are trying to do and how things turn
out for you.  Please feel free to contact me off-line for more detailed
discussions that would be of less interest to the entire group.


On 12/28/07 12:52 PM, "Milan Simonovic" <[EMAIL PROTECTED]> wrote:

> Hi all,
> 
>   I'm trying to implement a clustering algorithm on hadoop. Among
>   other things, there're a lot of matrix multiplications. LINA
>   (http://wiki.apache.org/lucene-hadoop/Lina) is probably going to be
>   a perfect fit here, but I can't afford to wait. Btw, I can't find
>   HADOOP-1655 any more, what's going on?
> 
>   Using the ordinary matrix product (sum of row by column products
>   gives one element from the resulting matrix), the easiest way to
>   formulate this computation is to have one row and one column sent to
>   a mapper and the output would be one element from the resulting
>   matrix. Reducer can take this element and put it into the correct
>   position in the output file.
> 
>   I need your advice on how to design input file(s) and how to make
>   input splits then. I'd like to have matrices in separate files
>   (they'll be used for more than one multiplication, and it's cleaner
>   to have them separate).
> 
>   I guess then I'd have to use MultiFileSplit and MultiFileInputFormat
>   somehow. Is it possible at all to send two records (one row and
>   one column, or two rows if the other matrix is column-oriented
>   ordered) from two input splits to a single mapper? Or should I look
>   for an alternative way to multiply matrixes?
> 

Reply via email to