[jira] Commented: (MAHOUT-305) Combine both cooccurrence-based CF M/R jobs

Ankur (JIRA) Mon, 22 Feb 2010 23:35:51 -0800

    [ 
https://issues.apache.org/jira/browse/MAHOUT-305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12837123#action_12837123
 ]


Ankur commented on MAHOUT-305:
------------------------------

With co-occurrence analysis we are dropping ratings. So if there are a lot of 
people who watched "Harry Potter" also watched "Maid in manhattan" it will have 
a higher chance of getting recommended regardless of ratings.

I am trying not be influenced too much by ratings as that is not the strength 
of this algorithm. Where it really shines is when you have lots and lots of 
sparse user click data where a click may be present or absent. Something like 
an online book store  or a shopping site. We are sticking with netflix as there 
is no such publicly available dataset AFAIK.    

Ok so moving forward with the action plan, here is what I propose to do. Please 
feel free to suggest modifications.

1. For each user take out the most recent movies that he has rated 3 or 4 or 5 
as TEST data.  Use the remaining as TRAIN data. 
2. Run both implementations in identical environment on test data and record 
runtimes and results
3. Join recommendation results with TEST data on 'user' key and calculate 
precision recall.
4. Report average precision & recall.
 
Ok so when separating top ratings as TEST data. For each user 

Precision @10 = (3,4,5 rating movies recommended & actually present ) / 10
Recall @ 10 =  (3,4,5 rating movies recommended & actually present ) / (all 
3,4,5 movies seen by user)

Hope this was more clear.


> Combine both cooccurrence-based CF M/R jobs
> -------------------------------------------
>
>                 Key: MAHOUT-305
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-305
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Collaborative Filtering
>    Affects Versions: 0.2
>            Reporter: Sean Owen
>            Assignee: Ankur
>            Priority: Minor
>
> We have two different but essentially identical MapReduce jobs to make 
> recommendations based on item co-occurrence: 
> org.apache.mahout.cf.taste.hadoop.{item,cooccurrence}. They ought to be 
> merged. Not sure exactly how to approach that but noting this in JIRA, per 
> Ankur.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAHOUT-305) Combine both cooccurrence-based CF M/R jobs

Reply via email to