Sorry, "k most similar users" should be "k most similar items" in the 1st step.
On Sun, Feb 21, 2010 at 1:12 PM, Gökhan Çapan <[email protected]> wrote: > To reduce the time to recommend while producing online recommendations, > here are steps I do for a very large dataset: > > 1. I compute item-item similarities (for all item pairs who have been > rated by at least one common user), and after some optimizations (like > content boosting) I store k most similar users with degree of similarity > for > each item. > 2. While recommendation time, the system takes a user history vector, > which does not need to be one of the users in the dataset, as an input. > 3. The algorithm looks all items in the input vector, fetches most > similar items from 1. If one of the most similar items of an item in the > user history are not rated by the user, it is added to recommendation list. > 4. The list is sorted and top n elements are recommended. > > Computing rating for a specific item is also computed with a similar way. > Also, if an item belongs to most similar items of more than one item in the > user history, the possibility to recommend this item is higher. > > If you mean a system like this, I should say implementation is mostly done > via Mahout. 1st step is computed by using mostSimilarItems function. Other > steps are not from Mahout, but they are easy to implement. > > > On Sat, Feb 20, 2010 at 9:46 PM, Ted Dunning <[email protected]>wrote: > >> This is just one of an infinite number of variations on item-based >> recommendation. The general idea is that you do some kind of magic to >> find >> item-item connections, you trim those to make it all work and then you >> recommend the items linked from the user's history of items they liked. >> If >> the budget runs out (time, space or $), then you trim more. All that the >> grouplens guys are saying is that trimming didn't hurt accuracy so it is >> probably good to do. >> >> The off-line connection finding can be done using LLR (for moderately high >> traffic situations), SVD (for cases where transitive dependencies are >> important), random indexing (poor man's SVD) or LDA (where small counts >> make >> SVD give crazy results). There are many other possibilities as well. >> >> It would be great if you felt an itch to implement some of these and >> decided >> to scratch it and contribute the results back to Mahout. >> >> On Sat, Feb 20, 2010 at 6:46 AM, jamborta <[email protected]> wrote: >> >> > >> > the basic concept of neighbourhood for item-based recommendation comes >> from >> > this paper: >> > >> > http://portal.acm.org/citation.cfm?id=371920.372071 >> > >> > this is the idea: >> > >> > "The fact that we only need a small fraction of similar items to compute >> > predictions leads us to an alternate model-based scheme. In this scheme, >> we >> > retain only a small number of similar items. For each item j we compute >> the >> > k most similar items. We term k as the model size. Based on this model >> > building step, our prediction generation algorithm works as follows. For >> > generating predictions for a user u on item i, our algorithm first >> > retrieves the precomputed k most similar items corresponding to the >> target >> > item i. Then it looks how many of those k items were purchased by the >> user >> > u, based on this intersection then the prediction is computed using >> basic >> > item-based collaborative filtering algorithm." >> > >> > -- >> > View this message in context: >> > >> http://old.nabble.com/item-based-recommendation-neighbourhood-size-tp27661482p27666954.html >> > Sent from the Mahout User List mailing list archive at Nabble.com. >> > >> > >> >> >> -- >> Ted Dunning, CTO >> DeepDyve >> > > > > -- > Gökhan Çapan > -- Gökhan Çapan
