Off-line user based analysis is quite feasible, however.

We worked with data larger than this at Veoh and could crunch the data down
to usable form in 10 hours on a 20 core micro-cluster.

The key step is computing sparse co-occurrences and filtering for
interesting non-zero values.

On Fri, Jul 24, 2009 at 2:11 AM, Sean Owen <[email protected]> wrote:

> Hundreds of millions of users is big indeed. Sounds like you have way
> more users than items. This tells me that any user-based algorithm is
> probably out of the question. The model certainly can't be loaded into
> memory on one machine. We could work on ways to compute all pairs of
> similarities in a distributed way, but that's trillions of
> similarities, even after filtering out some unnecessary work.
>



-- 
Ted Dunning, CTO
DeepDyve

Reply via email to