[ https://issues.apache.org/jira/browse/MAHOUT-460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12898727#action_12898727 ]
Sebastian Schelter commented on MAHOUT-460: ------------------------------------------- Patch attached, which fixes a big misunderstanding in the existing code. I had created MaybePruneRowsMapper from Sean's old UserVectorToCooccurrenceMapper. It's main use should have been to limit the number of cooccurrences per item in the RecommenderJob. Unfortunately it was applied to the item-user-matrix (the itemvectors) instead of the user-item-matrix (the uservectors), which is now corrected. Please note that the approach taken here is only a heuristic as each mapper instance tries to limit the number of cooccurrences on its own, if I understand the code correctly. I introduced a new job argument "maxCooccurrencesPerItem" with a default of 100. > Add "maxPreferencesPerItemConsidered" option to > o.a.m.cf.taste.hadoop.similarity.item.ItemSimilarityJob > ------------------------------------------------------------------------------------------------------- > > Key: MAHOUT-460 > URL: https://issues.apache.org/jira/browse/MAHOUT-460 > Project: Mahout > Issue Type: Improvement > Components: Collaborative Filtering > Reporter: Sebastian Schelter > Attachments: MAHOUT-460.patch > > > Because "coocurrence algorithms ... scale in the square of the number of > occurrences most popular item" (Ted wrote that in a recent mail) we should > offer a parameter to the ItemSimilarity job that makes it limit the number of > considered preferences per item. RecommenderJob already has such an option. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.