[jira] Commented: (MAHOUT-103) Co-occurence based nearest neighbourhood

Ankur (JIRA) Tue, 24 Nov 2009 01:43:05 -0800

    [ 
https://issues.apache.org/jira/browse/MAHOUT-103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12781838#action_12781838
 ]


Ankur commented on MAHOUT-103:
------------------------------

For this co-occurrence based recommender I am planning to write a set of 
map-reduce jobs that compute recommendations for users as folllowing:-

1. Take user's item history
2. for each item in his history fetch the top-N similar items. (Similarity 
based on co-occurrence)
3. Add the co-occurrence scores if an item appears more than once (NOT weighted 
avg). Consider an e.g. user-history { M1, M2, M3 } and top - 3 similar movies 
for each of these along with co-occurrence scores 

M1 -> (A, 5), (B, 4), (C, 2)
M2 -> (D, 6), (E, 3), (F, 2)
M3 -> (G, 8), (C, 5), (B, 2)  

So the final scores in decreasing order will look like
(G, 8)
(C, 7)
(B, 6)
(D, 6)
(A, 5)
(E, 3)
(F, 2)

The idea I want to capture is that a candidate item gets higher score if its 
similar to more items in user's click history.

Do you see any issue with this approach ? Any other better approach that you 
can think of ?

As for the precision-recall test, I am still trying to see how to divide the 
data in 'train' and 'test' for a fair evaluation. How do we do it in the 
existing code ?

> Co-occurence based nearest neighbourhood
> ----------------------------------------
>
>                 Key: MAHOUT-103
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-103
>             Project: Mahout
>          Issue Type: New Feature
>          Components: Collaborative Filtering
>            Reporter: Ankur
>            Assignee: Ankur
>         Attachments: jira-103.patch, mahout-103.patch.v1
>
>
> Nearest neighborhood type queries for users/items can be answered efficiently 
> and effectively by analyzing the co-occurrence model of a user/item w.r.t 
> another. This patch aims at providing an implementation for answering such 
> queries based upon simple co-occurrence counts.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAHOUT-103) Co-occurence based nearest neighbourhood

Reply via email to