For dense matrix multiplication, the key problem is that you have O(n^3) arithmetic operations and O(n^2) element fetches. Most conventional machines now have nearly 10^2 or larger ratio between the speed of the arithmetic processor and memory so for n > 100, you should be able to saturate the arithmetic unit of the processor with carefully written code.
Unfortunately, naively written code changes matters to require O(n^3) memory operations with an attendant slowdown of multiple orders of magnitude. This problem can become vastly worse on multi-processors and when you start talking about networked clusters, then things are worse still because the communication bandwidth is so limited. The key therefore, is to use data elements as many times as possible. If you use row and column decomposition and dot products as you suggest, there is some potential for re-use, but it can be difficult. The basic intuition is that you have to multiply each row of the left hand matrix against every column. This gives you reuse of the left hand matrix elements, but it increases the number of copies of the right hand matrix that you have to send around which defeats the gain for the left hand matrix. An easier way to actually get the gain is to use block decomposition of the input matrices. If the input blocks are sufficiently large enough then you will get your re-use just by using an efficient machine level multiply on each node. The blocks can be nearly square or can be strips, but blocks are probably a bit easier to get full speed on. A more serious question is whether the inherent n^2 nature of the problem will defeat you before you really run out of parallel processing steam. Are you sure that you need dense matrices? If it turns out that you don't need dense matrices then you have a very different kind of problem. The same sort of block decomposition strategy that applied for dense problems can be applied to sparse problems, but you have to be careful to balance the number of non-zero elements in the blocks. For many information retrieval and social algorithm problems, the data is subject to something like Zipf's law which can mean that you have very serious non-uniformity in the distribution of non-zero elements. The trick for dealing with this is often to do the block decomposition of a random permutation of the matrices which avoids the non-uniformity. You should also remember that text retrieval engines are essentially sparse matrix multiplication engines. You may be able to "cheat" a bit by using an engine such as Lucene with custom scoring functions or you might gain some benefit or inspiration from the code. Retrieval engines are particularly useful when it comes to the maintenance of the large matrices since that becomes particularly difficult with scale (probably harder than doing the multiplication in fact). I would love to hear more of what you are trying to do and how things turn out for you. Please feel free to contact me off-line for more detailed discussions that would be of less interest to the entire group. On 12/28/07 12:52 PM, "Milan Simonovic" <[EMAIL PROTECTED]> wrote: > Hi all, > > I'm trying to implement a clustering algorithm on hadoop. Among > other things, there're a lot of matrix multiplications. LINA > (http://wiki.apache.org/lucene-hadoop/Lina) is probably going to be > a perfect fit here, but I can't afford to wait. Btw, I can't find > HADOOP-1655 any more, what's going on? > > Using the ordinary matrix product (sum of row by column products > gives one element from the resulting matrix), the easiest way to > formulate this computation is to have one row and one column sent to > a mapper and the output would be one element from the resulting > matrix. Reducer can take this element and put it into the correct > position in the output file. > > I need your advice on how to design input file(s) and how to make > input splits then. I'd like to have matrices in separate files > (they'll be used for more than one multiplication, and it's cleaner > to have them separate). > > I guess then I'd have to use MultiFileSplit and MultiFileInputFormat > somehow. Is it possible at all to send two records (one row and > one column, or two rows if the other matrix is column-oriented > ordered) from two input splits to a single mapper? Or should I look > for an alternative way to multiply matrixes? >