[ 
https://issues.apache.org/jira/browse/MAHOUT-305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12838044#action_12838044
 ] 

Tamas Jambor commented on MAHOUT-305:
-------------------------------------

I usually pick random N % data for each user as Ankur suggested. This would 
ensure that the recommender is not biased, and it doesn't really matter that 
non-relevant items are in this subset, since they needed to be ranked lower 
anyway. 

I think the way Sean implemented it is also pretty good, taking the top-n 
relevant items and evaluating on that data, but you have to build a new model 
for each user, which makes it impossible to use it on a big data set, 
especially with SVD.

I agree that the other issue is how to deal with non-rated item. I personally 
just rank items that has known ratings, so that the relevance judgement is 
always known. but I've been thinking to changing it to count unrated items as 
non-relevant. I think there are pros and cons either way.


> Combine both cooccurrence-based CF M/R jobs
> -------------------------------------------
>
>                 Key: MAHOUT-305
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-305
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Collaborative Filtering
>    Affects Versions: 0.2
>            Reporter: Sean Owen
>            Assignee: Ankur
>            Priority: Minor
>
> We have two different but essentially identical MapReduce jobs to make 
> recommendations based on item co-occurrence: 
> org.apache.mahout.cf.taste.hadoop.{item,cooccurrence}. They ought to be 
> merged. Not sure exactly how to approach that but noting this in JIRA, per 
> Ankur.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to