I would think blockwise multiplication (which is, by the way, has a standard
algorithm in Matrix computations by Van Loan and Golub), is pretty pointless
with Mahout since there's no blockwise matrix format presently, and even if
it were, no existing algorithms support it. All prep utils only produce
row-wise format. We could write a routine to "block" it but it would seem to
be an exercise in futility.

Second remark is that blockwise multiplication is also pointless for
sufficiently sparse matrices. Indeed, sum of outer products of columns and
rows with intermediate reduction in combiners is by far most promising in
terms of shuffle/sort io. Outer products, when further split in columns or
rows, would also be quite sparse and hence small in size while reduction
in keyset cardinality is just gigantic compared to blockwise
multiplications. (That said, i never ran comparison benchmark of the two)

Note that what authors essentially are suggesting (even in strategy 4)
that there is explosive growth of shuffle and sort keyset i/o, and what's
more, they say they never tried it in distributed mode(!). imagine hundreds
of machines sending a copy of their input to a lot of other machines in the
cluster. Summing outer products avoids broadcasting the input to multiple
reducers.

On another note, if input is similarly partitioned (not always the case),
then map-side multiplication will always be I/O superior to reduce-side
multiplication since while I/O is less and especially less in the keyset
cardinality undergoing thru sorters. The power of map-side operations comes
from the notion that yes we require a lot from the input but no, it's not a
lot if input is already part of a bigger MR pipeline.

Finally, back to 0.20/0.21 issue... I said before in this thread that
migrating to 0.21 would render Mahout incompatible with majority of
production frameworks out there. But after working with ssvd code, i came to
think of a compromise: since most of the production environments are running
Cloudera distribution, many 0.21 things are supported there and there's a
lot of code around that's written for new API which is backported in
Cloudera. It's difficult for me to judge how much Cloudera's implementation
covers of what is in 0.21 (in fact, i did come across a couple of 0.21
things still missing in CDH), but in terms of Hadoop compatibility, i think
Mahout project would be best served if it indeed moved on to a new api (i.e.
0.21 ) but would not get ahead of what is supported in CDH3. That would keep
it on the edge of what's currently practical and out there. Keeping sitting
on the old api IMO is definitely a drag. My stochastic svd code is using new
api in CDH3 and i would very much not want to backport it to old api, it
would not be practical as everyone out there is on CDH and more so than on
0.20.2.

-Dmitriy



>  Some more general remarks: I think the matrix multiplication can be
>> implemented more efficiently. I've done a matrix multiplication of a sparse
>> 500kx15k matrix with around 35 million elements on a quite powerful cluster
>> of 10 nodes, and this took around 30 minutes. I have no idea of the
>> performance of the implementation described at
>> http://homepage.mac.com/j.norstad/matrix-multiply/index.html, so I can't
>> really compare. But Imho this can be improved ( though it's possible that
>> the poor performance was due to mistakes made by me )
>>
> I will definitely investigate these methods over the coming days, these
> look fantastic.
>
> Shannon
>

Reply via email to