Re: item-based recommendation neighbourhood size

Gökhan Çapan Sun, 21 Feb 2010 03:16:53 -0800

Sorry, "k most similar users" should be "k most similar items" in the 1st
step.


On Sun, Feb 21, 2010 at 1:12 PM, Gökhan Çapan <[email protected]> wrote:

> To reduce the time to recommend while producing online recommendations,
> here are steps I do for a very large dataset:
>
>    1.  I compute item-item similarities (for all item pairs who have been
>    rated by at least one common user), and after some optimizations (like
>    content boosting) I store k most similar users with degree of similarity 
> for
>    each item.
>    2. While recommendation time, the system takes a user history vector,
>    which does not need to be one of the users in the dataset, as an input.
>    3. The algorithm looks all items in the input vector, fetches most
>    similar items from 1. If one of the most similar items of an item in the
>    user history are not rated by the user, it is added to recommendation list.
>    4. The list is sorted and top n elements are recommended.
>
> Computing rating for a specific item is also computed with a similar way.
> Also, if an item belongs to most similar items of more than one item in the
> user history, the possibility to recommend this item is higher.
>
> If you mean a system like this, I should say implementation is mostly done
> via Mahout. 1st step is computed by using mostSimilarItems function. Other
> steps are not from Mahout, but they are easy to implement.
>
>
> On Sat, Feb 20, 2010 at 9:46 PM, Ted Dunning <[email protected]>wrote:
>
>> This is just one of an infinite number of variations on item-based
>> recommendation.  The general idea is that you do some kind of magic to
>> find
>> item-item connections, you trim those to make it all work and then you
>> recommend the items linked from the user's history of items they liked.
>>  If
>> the budget runs out (time, space or $), then you trim more.  All that the
>> grouplens guys are saying is that trimming didn't hurt accuracy so it is
>> probably good to do.
>>
>> The off-line connection finding can be done using LLR (for moderately high
>> traffic situations), SVD (for cases where transitive dependencies are
>> important), random indexing (poor man's SVD) or LDA (where small counts
>> make
>> SVD give crazy results).  There are many other possibilities as well.
>>
>> It would be great if you felt an itch to implement some of these and
>> decided
>> to scratch it and contribute the results back to Mahout.
>>
>> On Sat, Feb 20, 2010 at 6:46 AM, jamborta <[email protected]> wrote:
>>
>> >
>> > the basic concept of neighbourhood for item-based recommendation comes
>> from
>> > this paper:
>> >
>> > http://portal.acm.org/citation.cfm?id=371920.372071
>> >
>> > this is the idea:
>> >
>> > "The fact that we only need a small fraction of similar items to compute
>> > predictions leads us to an alternate model-based scheme. In this scheme,
>> we
>> > retain only a small number of similar items. For each item j we compute
>> the
>> > k most similar items. We term k as the model size. Based on this model
>> > building step, our prediction generation algorithm works as follows. For
>> > generating predictions for a user u on item i, our algorithm  first
>> > retrieves the precomputed k most similar items corresponding to the
>> target
>> > item i. Then it looks how many of those k items were purchased by the
>> user
>> > u, based on this intersection then the prediction is computed using
>> basic
>> > item-based collaborative filtering algorithm."
>> >
>> > --
>> > View this message in context:
>> >
>> http://old.nabble.com/item-based-recommendation-neighbourhood-size-tp27661482p27666954.html
>> > Sent from the Mahout User List mailing list archive at Nabble.com.
>> >
>> >
>>
>>
>> --
>> Ted Dunning, CTO
>> DeepDyve
>>
>
>
>
> --
> Gökhan Çapan
>



-- 
Gökhan Çapan

Re: item-based recommendation neighbourhood size

Reply via email to