NMF should be amenable to the stochastic decomposition trick. That reduces the problem to a much smaller factorization that you could plausibly do using sequential techniques. Jake Manning is working on getting factorizations going, but I don't know if he has gotten to NMF.
Jake's comment about doing the multiplication cleanly using sparse techniques is important, but you should also remember that if you have a sparse matrix stored in triple form, doing a transpose is simply a matter of changing which index your consider a row and which a column. For some algorithms, it might pay to sort the result once so that you can do a sort-join for the multiplication, but that isn't hard. On Tue, Dec 8, 2009 at 5:43 PM, Zhengguo 'Mike' SUN <[email protected]>wrote: > Hi Ted, John, > > I am also interested in matrix multiplication. Actually, I am implementing > Non-negative Matrix Factorization, which basically is iterations of matrix > multiplication. What I did is exactly as what Ted said: doing an outer > product using one column from left matrix and one row from right matrix and > summing up all the results. Idealy, this could be done using only one job: > mappers doing multiplication and reducers doing summing. But the thing is > that how do you match a column from left matrix and a row from right matrix. > Assuming that two input matrices are give in column major order and row > major order, it seems to me that you have to implement a customized > InputFormat to do the matching. Or maybe you could use one matrix as input > and read the other inside mappers, but this also has problem when you have a > lot of mappers. The straightforward way seems to be using two jobs. The > mappers of the first job emit the id and the content of the row or column it > reads and the reducers do the multiplication. The second job is to do the > summing. This is kind of similar to what John did. But it is inefficient if > you have many iterations. Any comments? > > >
