Re: Algorithm scalability

Sean Owen Wed, 05 May 2010 09:36:20 -0700

The canonical algorithms, certainly all that I know of, all compute
recommendations as a function, generally, of all the input data.
They're not inherently distributable, no.

I think all can be reimagined as a distributed process, with enough
care. The output remains a function of all data, however. The
distributed form is slower, and so to compute one recommendation, as a
function of all data, when data is huge, and distribute it, is
infeasibly slow.

So, generally you are looking at computing lots of recommendations at
once, perhaps all of them, in order to amortize the overhead. And when
you are doing all the work at once, the distributed process can
actually be pretty efficient.

For example the co-occurrence based distributed recommender is really
just a simplistic item-based recommender. You can see how much the
form and characteristics change in the translation.

On Wed, May 5, 2010 at 3:59 PM, First Qaxy <qa...@yahoo.ca> wrote:
> Out of curiosity - sorry if this have been answered before - would it be 
> possible to combine the two approaches so you could break the data set in 
> batches that could fit in memory and use a non-distributed algorithm to 
> provide results for each batch and then use Hadoop to merge the results in a 
> sensible way? This would improve performance while scaling (this is different 
> than the pseudo approach where you simply distribute the work on the same 
> model). I didn't give it much though but I think this might work some limited 
> cases.

Re: Algorithm scalability

Reply via email to