OK I will have another look. That makes more sense. I think the index still has to be adjusted but that's simple. On Apr 2, 2013 8:37 AM, "Cunlu Zou (JIRA)" <[email protected]> wrote:
> > [ > https://issues.apache.org/jira/browse/MAHOUT-1185?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel] > > Cunlu Zou reopened MAHOUT-1185: > ------------------------------- > > > Please check the code carefully, there are two variables calcuated in the > processOneUser function, the average diffs (the variable *average* in the > code) calculated correctly as you said, but there is also another variable > to calculate the average preference value for *individual item* (the > variable *itemAverage* in the code), they are totally different. The > itemAverage value is used when no diffs values are avaible to predict the > preference, for example, suppose we have following user-pref matrix (a-c > are users,A-C are items) > | ||A||B||C| > |a||1||-||3| > |b||2||-||4| > |c||-||2||-| > for user c, we wanna predict the preference value for item C, since we > only know user c has the preference value for item B, but there is no diff > value available between B and C, in this case, the mahout tried to use the > average value for item C which is (3+4)/2=3.5 as the predict value for the > item C. The same case for user c to predict the preference value for item > A. By comparing the predicted values, we then recommend item C not item A > to user c instead. > > However, the code has the mistake for calculating this average value (*NOT > the DIFF value) as I stated in the previous comments, hope I made this > clear. > > > MemoryDiffStorage.class has a bug for slope one algorithm which could > cause incorrect recommendation results > > > ------------------------------------------------------------------------------------------------------------ > > > > Key: MAHOUT-1185 > > URL: https://issues.apache.org/jira/browse/MAHOUT-1185 > > Project: Mahout > > Issue Type: Bug > > Components: Collaborative Filtering > > Affects Versions: 0.7 > > Environment: Ubuntu > > Reporter: Cunlu Zou > > Assignee: Sean Owen > > Labels: patch > > Attachments: MemoryDiffStorage.patch > > > > Original Estimate: 10m > > Remaining Estimate: 10m > > > > The function processOneUser(long averageCount, long userID) in the > MemoryDiffStorage.class file contains a bug for calculating the > itemAverage. Since the function tried to calculate the average difference > among items (in a nested loop) and also the average individual item > preference value in the same loop (the loop only from 0 to length-2, *for > (int i = 0; i < length - 1; i++)*), the itemAverage variable does not count > the last item's preference value for every users which could lead to an > incorrect recommendation results. > > -- > This message is automatically generated by JIRA. > If you think it was sent incorrectly, please contact your JIRA > administrators > For more information on JIRA, see: http://www.atlassian.com/software/jira >
