I would think blockwise multiplication (which is, by the way, has a standard algorithm in Matrix computations by Van Loan and Golub), is pretty pointless with Mahout since there's no blockwise matrix format presently, and even if it were, no existing algorithms support it. All prep utils only produce row-wise format. We could write a routine to "block" it but it would seem to be an exercise in futility.
Second remark is that blockwise multiplication is also pointless for sufficiently sparse matrices. Indeed, sum of outer products of columns and rows with intermediate reduction in combiners is by far most promising in terms of shuffle/sort io. Outer products, when further split in columns or rows, would also be quite sparse and hence small in size while reduction in keyset cardinality is just gigantic compared to blockwise multiplications. (That said, i never ran comparison benchmark of the two) Note that what authors essentially are suggesting (even in strategy 4) that there is explosive growth of shuffle and sort keyset i/o, and what's more, they say they never tried it in distributed mode(!). imagine hundreds of machines sending a copy of their input to a lot of other machines in the cluster. Summing outer products avoids broadcasting the input to multiple reducers. On another note, if input is similarly partitioned (not always the case), then map-side multiplication will always be I/O superior to reduce-side multiplication since while I/O is less and especially less in the keyset cardinality undergoing thru sorters. The power of map-side operations comes from the notion that yes we require a lot from the input but no, it's not a lot if input is already part of a bigger MR pipeline. Finally, back to 0.20/0.21 issue... I said before in this thread that migrating to 0.21 would render Mahout incompatible with majority of production frameworks out there. But after working with ssvd code, i came to think of a compromise: since most of the production environments are running Cloudera distribution, many 0.21 things are supported there and there's a lot of code around that's written for new API which is backported in Cloudera. It's difficult for me to judge how much Cloudera's implementation covers of what is in 0.21 (in fact, i did come across a couple of 0.21 things still missing in CDH), but in terms of Hadoop compatibility, i think Mahout project would be best served if it indeed moved on to a new api (i.e. 0.21 ) but would not get ahead of what is supported in CDH3. That would keep it on the edge of what's currently practical and out there. Keeping sitting on the old api IMO is definitely a drag. My stochastic svd code is using new api in CDH3 and i would very much not want to backport it to old api, it would not be practical as everyone out there is on CDH and more so than on 0.20.2. -Dmitriy > Some more general remarks: I think the matrix multiplication can be >> implemented more efficiently. I've done a matrix multiplication of a sparse >> 500kx15k matrix with around 35 million elements on a quite powerful cluster >> of 10 nodes, and this took around 30 minutes. I have no idea of the >> performance of the implementation described at >> http://homepage.mac.com/j.norstad/matrix-multiply/index.html, so I can't >> really compare. But Imho this can be improved ( though it's possible that >> the poor performance was due to mistakes made by me ) >> > I will definitely investigate these methods over the coming days, these > look fantastic. > > Shannon >
