On Tue, May 4, 2010 at 2:01 PM, Sean Owen <sro...@gmail.com> wrote: > On Tue, May 4, 2010 at 9:53 PM, First Qaxy <qa...@yahoo.ca> wrote: > > Purely based on estimates, assuming 5 billion transactions, 5 million > users, 100K products normally distributed are expected to create a sparse > item to item matrix of up to 10 Million significant co-occurrences > (significance is not globally defined but in the context of the active item > to recommend from; in other words support can be really tiny, confidence > less so). > > Sounds like a pretty solid size of a data set. I think the recommender > will work fine on this -- well, suppose it depends on your > expectations but this whole piece has been completely revised recently > and I feel that it's tuned nicely now. >
For this scale, the random projection SVD algorithms can even work in R. With 5 billion transactions, you will need to have a partial out-of-core implementation, but I would strongly expect the decomposition and even any follow-on clustering to be almost entirely I/O bound just reading your original data.