[ https://issues.apache.org/jira/browse/MAHOUT-103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12781838#action_12781838 ]
Ankur commented on MAHOUT-103: ------------------------------ For this co-occurrence based recommender I am planning to write a set of map-reduce jobs that compute recommendations for users as folllowing:- 1. Take user's item history 2. for each item in his history fetch the top-N similar items. (Similarity based on co-occurrence) 3. Add the co-occurrence scores if an item appears more than once (NOT weighted avg). Consider an e.g. user-history { M1, M2, M3 } and top - 3 similar movies for each of these along with co-occurrence scores M1 -> (A, 5), (B, 4), (C, 2) M2 -> (D, 6), (E, 3), (F, 2) M3 -> (G, 8), (C, 5), (B, 2) So the final scores in decreasing order will look like (G, 8) (C, 7) (B, 6) (D, 6) (A, 5) (E, 3) (F, 2) The idea I want to capture is that a candidate item gets higher score if its similar to more items in user's click history. Do you see any issue with this approach ? Any other better approach that you can think of ? As for the precision-recall test, I am still trying to see how to divide the data in 'train' and 'test' for a fair evaluation. How do we do it in the existing code ? > Co-occurence based nearest neighbourhood > ---------------------------------------- > > Key: MAHOUT-103 > URL: https://issues.apache.org/jira/browse/MAHOUT-103 > Project: Mahout > Issue Type: New Feature > Components: Collaborative Filtering > Reporter: Ankur > Assignee: Ankur > Attachments: jira-103.patch, mahout-103.patch.v1 > > > Nearest neighborhood type queries for users/items can be answered efficiently > and effectively by analyzing the co-occurrence model of a user/item w.r.t > another. This patch aims at providing an implementation for answering such > queries based upon simple co-occurrence counts. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.