[ 
https://issues.apache.org/jira/browse/MAHOUT-460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12898727#action_12898727
 ] 

Sebastian Schelter commented on MAHOUT-460:
-------------------------------------------

Patch attached, which fixes a big misunderstanding in the existing code. I had 
created MaybePruneRowsMapper from Sean's old UserVectorToCooccurrenceMapper. 
It's main use should have been to limit the number of cooccurrences per item in 
the RecommenderJob. Unfortunately it was applied to the item-user-matrix (the 
itemvectors) instead of the user-item-matrix (the uservectors), which is now 
corrected.

Please note that the approach taken here is only a heuristic as each mapper 
instance tries to limit the number of cooccurrences on its own, if I understand 
the code correctly.

I introduced a new job argument "maxCooccurrencesPerItem" with a default of 100.

> Add "maxPreferencesPerItemConsidered" option to 
> o.a.m.cf.taste.hadoop.similarity.item.ItemSimilarityJob
> -------------------------------------------------------------------------------------------------------
>
>                 Key: MAHOUT-460
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-460
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Collaborative Filtering
>            Reporter: Sebastian Schelter
>         Attachments: MAHOUT-460.patch
>
>
> Because "coocurrence algorithms ... scale in the square of the number of 
> occurrences most popular item" (Ted wrote that in a recent mail) we should 
> offer a parameter to the ItemSimilarity job that makes it limit the number of 
> considered preferences per item. RecommenderJob already has such an option.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to