[
https://issues.apache.org/jira/browse/MAHOUT-898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13158028#comment-13158028
]
Sean Owen commented on MAHOUT-898:
----------------------------------
I understand the issue, but this doesn't fix it. Say your ratings are between 1
and 5. Say you have similarity -0.5 to an item rated 3 and -0.5 to an item
rated 4. Using the absolute value in the denominator only would lead you to
estimate a preference of -3.5, which is also not possible. It's not even
reasonable to cap it to 1 here.
Really... negative weights are just a problem since they don't make sense. In
practice, in the framework, the *only* metric with this problem is Pearson,
since it's the only one that actually returns values < 0. In retrospect would
have been nicer to define this as returning a value between 0 and 1.
You could use (1+similarity) as a weight, since that's at least nonnegative. I
feel like I did it this way in the beginning... and took it out as it caused
another problem. I'd have to think about just why that was. We could go back to
that; it has non-trivial implications.
I don't want to make this exact change but leave it open for some other ideas.
> Error in formula for preference estimation in GenericItemBasedRecommender
> -------------------------------------------------------------------------
>
> Key: MAHOUT-898
> URL: https://issues.apache.org/jira/browse/MAHOUT-898
> Project: Mahout
> Issue Type: Bug
> Components: Collaborative Filtering
> Environment: mahout-core
> Reporter: Paulo Villegas
> Assignee: Sean Owen
> Priority: Minor
> Labels: patch
> Fix For: 0.6
>
> Attachments: GenericItemBasedRecommender.diff
>
>
> The formula to estimate the preference for an item in the Taste item-based
> recommender normalizes by the sum of similarities for items used in
> estimation. But the terms in the sum taken to normalize should be in absolute
> value, since they can be negative (e.g. when using Pearson correlation,
> similarity is in [-1,1]). Now they are not, and as a result when there are
> negative and positive values they cancel out, giving a small denominator and
> incorrectly boosting the preference for the item (symptom: it is easy for a
> predicted preference to take the maximum value, since the quotient becomes
> large and it is capped afterwards)
> The patch is rather trivial (a one-liner, actually) for
> src/main/java/org/apache/mahout/cf/taste/impl/recommender/GenericItemBasedRecommender.java
> Note: the same error & suggested fix happens in GenericUserBasedRecommender
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira