Wow that's a lot of thinking. Maybe we can back up a bit and speak in generalities.
In this world there's no such thing as cramming the data set into memory. SparseMatrix is out. There isn't any effiicient way to access arbitrary data, at all. You'll think in terms of an algorithm that proceeds in a series of large, bulk operations (e.g. matrix times matrix) which are decomposable into small, independent computations. Simple ideas like "divide everything by the count of all users" can be surprisingly tricky. You have to count, output, then inject that value into a new mapreduce. You can see my implementation of a simplistic co-occurrence based recommender in org.apache.mahout.cf.taste.hadoop. It proceeds mostly in terms of operations on vectors and it's a little tangled. DistributedRowMatrix may streamline a lot of this, I'm not personally familiar with it (but sounds like an excellent sort of idea). I operate in terms of Vectors, and "matrices" as lots of Vectors together. There's never an operation whose atomic component is in terms of "a whole matrix". That may send you back to the drawing boards a bit, but this is the structure / data structure to have in mind on Hadoop. others I'm sure have more nuanced ideas here.
