[jira] Updated: (MAHOUT-420) Improving the distributed item-based recommender

Sean Owen (JIRA) Thu, 08 Jul 2010 12:21:16 -0700

     [ 
https://issues.apache.org/jira/browse/MAHOUT-420?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Sean Owen updated MAHOUT-420:
-----------------------------

           Status: Resolved  (was: Patch Available)
         Assignee: Sean Owen
    Fix Version/s: 0.4
       Resolution: Fixed

> Improving the distributed item-based recommender
> ------------------------------------------------
>
>                 Key: MAHOUT-420
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-420
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Collaborative Filtering
>            Reporter: Sebastian Schelter
>            Assignee: Sean Owen
>             Fix For: 0.4
>
>         Attachments: MAHOUT-420-2.patch, MAHOUT-420-2a.patch, 
> MAHOUT-420-3.patch, MAHOUT-420.patch
>
>
> A summary of the discussion on the mailing list:
> Extend the distributed item-based recommender from using only simple 
> cooccurrence counts to using the standard computations of an item-based 
> recommender as defined in Sarwar et al "Item-Based Collaborative Filtering 
> Recommendation Algorithms" 
> (http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.144.9927&rep=rep1&type=pdf).
> What the distributed recommender generally does is that it computes the 
> prediction values for all users towards all items those users have not rated 
> yet. And the computation is done in the following way:
>  u = a user
>  i = an item not yet rated by u
>  N = all items cooccurring with i
>  Prediction(u,i) = sum(all n from N: cooccurrences(i,n) * rating(u,n))
> The formula used in the paper which is used by 
> GenericItemBasedRecommender.doEstimatePreference(...) too, looks very similar 
> to the one above:
>  u = a user
>  i = an item not yet rated by u
>  N = all items similar to i (where similarity is usually computed by 
> pairwisely comparing the item-vectors of the user-item matrix)
>  Prediction(u,i) = sum(all n from N: similarity(i,n) * rating(u,n)) / sum(all 
> n from N: abs(similarity(i,n)))
> There are only 2 differences:
>  a) instead of the cooccurrence count, certain similarity measures like 
> pearson or cosine can be used
>  b) the resulting value is normalized by the sum of the similarities
> To overcome difference a) we would only need to replace the part that 
> computes the cooccurrence matrix with the code from ItemSimilarityJob or the 
> code introduced in MAHOUT-418, then we could compute arbitrary similarity 
> matrices and use them in the same way the cooccurrence matrix is currently 
> used. We just need to separate steps up to creating the co-occurrence matrix 
> from the rest, which is simple since they're already different MR jobs. 
> Regarding difference b) from a first look at the implementation I think it 
> should be possible to transfer the necessary similarity matrix entries from 
> the PartialMultiplyMapper to the AggregateAndRecommendReducer to be able to 
> compute the normalization value in the denominator of the formula. This will 
> take a little work, yes, but is still straightforward. It canbe in the 
> "common" part of the process, done after the similarity matrix is generated.
> I think work on this issue should wait until MAHOUT-418 is resolved as the 
> implementation here depends on how the pairwise similarities will be computed 
> in the future.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAHOUT-420) Improving the distributed item-based recommender

Reply via email to