I thought it might be worth bringing this back to the user list. Ankur effectively raised issues about the performance of org.apache.mahout.cf.taste.hadoop.item by adding org.apache.mahout.cf.taste.hadoop.cooccurrence, which is a similar recommender job (item cooccurrence-based) but with a different implementation. ".item" ultimately does not distribute the matrix-user vector multiply, and ".coocurrence" highly distributes it.
.item accomplished this by side-loading the co-occurrence matrix into a reducer, by accessing it from disk as MapFiles. This way of accessing columns proved to be very slow. After much experimentation, I've completely overhauled .item by grafting in ideas from .cooccurrence. It is a sort of best-of-both-worlds hybrid of the two. It borrows a clever way to join two kinds of input into one MapReduce, in order to join the co-occurrence matrix columns and individual elements of each user vector. The product is output and recombined later. This hybrid retains features of .item like accommodating user ratings. Letting Hadoop manage the data flow, even though it takes a bit more copying, avoiding reading from MapFile in a random-access manner, using features like the Combiner, and being smarter about Writables has sped this up for me by at least a factor of 10 -- mostly that avoiding MapFiles. I bring it up since it's interesting, a good development for anyone using this implementation, and an area that is ripe for more testing and improvement I imagine. Sean