The canonical algorithms, certainly all that I know of, all compute recommendations as a function, generally, of all the input data. They're not inherently distributable, no.
I think all can be reimagined as a distributed process, with enough care. The output remains a function of all data, however. The distributed form is slower, and so to compute one recommendation, as a function of all data, when data is huge, and distribute it, is infeasibly slow. So, generally you are looking at computing lots of recommendations at once, perhaps all of them, in order to amortize the overhead. And when you are doing all the work at once, the distributed process can actually be pretty efficient. For example the co-occurrence based distributed recommender is really just a simplistic item-based recommender. You can see how much the form and characteristics change in the translation. On Wed, May 5, 2010 at 3:59 PM, First Qaxy <qa...@yahoo.ca> wrote: > Out of curiosity - sorry if this have been answered before - would it be > possible to combine the two approaches so you could break the data set in > batches that could fit in memory and use a non-distributed algorithm to > provide results for each batch and then use Hadoop to merge the results in a > sensible way? This would improve performance while scaling (this is different > than the pseudo approach where you simply distribute the work on the same > model). I didn't give it much though but I think this might work some limited > cases.