[ https://issues.apache.org/jira/browse/MAHOUT-305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12837123#action_12837123 ]
Ankur commented on MAHOUT-305: ------------------------------ With co-occurrence analysis we are dropping ratings. So if there are a lot of people who watched "Harry Potter" also watched "Maid in manhattan" it will have a higher chance of getting recommended regardless of ratings. I am trying not be influenced too much by ratings as that is not the strength of this algorithm. Where it really shines is when you have lots and lots of sparse user click data where a click may be present or absent. Something like an online book store or a shopping site. We are sticking with netflix as there is no such publicly available dataset AFAIK. Ok so moving forward with the action plan, here is what I propose to do. Please feel free to suggest modifications. 1. For each user take out the most recent movies that he has rated 3 or 4 or 5 as TEST data. Use the remaining as TRAIN data. 2. Run both implementations in identical environment on test data and record runtimes and results 3. Join recommendation results with TEST data on 'user' key and calculate precision recall. 4. Report average precision & recall. Ok so when separating top ratings as TEST data. For each user Precision @10 = (3,4,5 rating movies recommended & actually present ) / 10 Recall @ 10 = (3,4,5 rating movies recommended & actually present ) / (all 3,4,5 movies seen by user) Hope this was more clear. > Combine both cooccurrence-based CF M/R jobs > ------------------------------------------- > > Key: MAHOUT-305 > URL: https://issues.apache.org/jira/browse/MAHOUT-305 > Project: Mahout > Issue Type: Improvement > Components: Collaborative Filtering > Affects Versions: 0.2 > Reporter: Sean Owen > Assignee: Ankur > Priority: Minor > > We have two different but essentially identical MapReduce jobs to make > recommendations based on item co-occurrence: > org.apache.mahout.cf.taste.hadoop.{item,cooccurrence}. They ought to be > merged. Not sure exactly how to approach that but noting this in JIRA, per > Ankur. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.