On Tue, May 4, 2010 at 2:01 PM, Sean Owen <sro...@gmail.com> wrote:

> On Tue, May 4, 2010 at 9:53 PM, First Qaxy <qa...@yahoo.ca> wrote:
> > Purely based on estimates, assuming 5 billion transactions, 5 million
> users, 100K products normally distributed are expected to create a sparse
> item to item matrix of up to 10 Million significant co-occurrences
> (significance is not globally defined but in the context of the active item
> to recommend from; in other words support can be really tiny, confidence
> less so).
>
> Sounds like a pretty solid size of a data set. I think the recommender
> will work fine on this -- well, suppose it depends on your
> expectations but this whole piece has been completely revised recently
> and I feel that it's tuned nicely now.
>

For this scale, the random projection SVD algorithms can even work in R.
 With 5 billion transactions, you will need to have a partial out-of-core
implementation, but I would strongly expect the decomposition and even any
follow-on clustering to be almost entirely I/O bound just reading your
original data.

Reply via email to