[ https://issues.apache.org/jira/browse/MAHOUT-305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12837377#action_12837377 ]
Ted Dunning commented on MAHOUT-305: ------------------------------------ My own experience is that all that counts in recommendations is the probability of click (interest) on a set of recommendations. As such, the best analog is probably precision at 10 or 20. I don't think that recall at 10 or 20 makes any sense at all (with a depth limited situation like this, you have given up on recall and are only looking at precision). Ankur's suggestion about keeping the most recent 4's and 5's as test data seems right to me. My only beefs are that you don't need rec...@10 and what to do with the unrated items. Presumably a new style algorithm could surface items that the user hadn't thought of, but really likes. In practice, I think that counting unrated items in the results as misses isn't a big deal in the Netflix data. In the real world where test data is more scarce, I would count unrated items as misses in off-line evaluation, but try to run as many alternatives as possible against live users. > Combine both cooccurrence-based CF M/R jobs > ------------------------------------------- > > Key: MAHOUT-305 > URL: https://issues.apache.org/jira/browse/MAHOUT-305 > Project: Mahout > Issue Type: Improvement > Components: Collaborative Filtering > Affects Versions: 0.2 > Reporter: Sean Owen > Assignee: Ankur > Priority: Minor > > We have two different but essentially identical MapReduce jobs to make > recommendations based on item co-occurrence: > org.apache.mahout.cf.taste.hadoop.{item,cooccurrence}. They ought to be > merged. Not sure exactly how to approach that but noting this in JIRA, per > Ankur. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.