[jira] [Commented] (MAHOUT-898) Error in formula for preference estimation in GenericItemBasedRecommender

Paulo Villegas (Commented) (JIRA) Sun, 27 Nov 2011 14:27:06 -0800

    [ 
https://issues.apache.org/jira/browse/MAHOUT-898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13158051#comment-13158051
 ]


Paulo Villegas commented on MAHOUT-898:
---------------------------------------

I think that your example is actually desired behaviour :-) Pearson correlation 
measures linear dependency between two variables; if it's 0 it means they are 
independent from each other (at least linearly) so that that item shouldn't 
influence your preference, and it does work that way. But if it has a negative 
value, it means that there is a linear dependence with negative slope. That is, 
my preferences for the item being estimated are negatively correlated with 
those other items: when they have a high rating, mine for the new item should 
be low. So, if the items have 3 & 4, giving a 1 (capping to the minimum) is not 
totally unreasonable, though perhaps a bit extreme (having only items with 
negative correlations shouldn't happen too often anyway, though I've indeed 
seen that).

Even though Pearson is the only metric producing negative values, it is not a 
fringe case, since it is probably the most used metric for neighborhood CF (and 
for good reason -- it tends to produce the best results and it costs much less 
than rank-based metrics such as Spearman). Hence ensuring it behaves reasonably 
is good.

I saw the (1+similarity) variant when looking at previous versions, it comes 
from issue MAHOUT-321. But the problem, when it comes to Pearson, is that it 
enables items with correlation of 0 to have influence on the final result (and 
they shouldn't, since they are uncorrelated with the item being computed).

The issue would probably work better if ratings could be mean-centered (i.e. 
remove the mean before getting into the preference estimation), which is also a 
standard practice. I'm trying to do something along this, but in the mean time 
I proposed the 'abs' solution to at least avoid bizarre outputs (the current 
behaviour produces 'surprising' recommendations, and while some serendipity is 
a desired behaviour in a recommender, it would be better to have a way of 
controlling it).
                
> Error in formula for preference estimation in GenericItemBasedRecommender
> -------------------------------------------------------------------------
>
>                 Key: MAHOUT-898
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-898
>             Project: Mahout
>          Issue Type: Bug
>          Components: Collaborative Filtering
>         Environment: mahout-core
>            Reporter: Paulo Villegas
>            Assignee: Sean Owen
>            Priority: Minor
>              Labels: patch
>             Fix For: 0.6
>
>         Attachments: GenericItemBasedRecommender.diff
>
>
> The formula to estimate the preference for an item in the Taste item-based 
> recommender normalizes by the sum of similarities for items used in 
> estimation. But the terms in the sum taken to normalize should be in absolute 
> value, since they can be negative (e.g. when using Pearson correlation, 
> similarity is in [-1,1]). Now they are not, and as a result when there are 
> negative and positive values they cancel out, giving a small denominator and 
> incorrectly boosting the preference for the item (symptom: it is easy for a 
> predicted preference to take the maximum value, since the quotient becomes 
> large and it is capped afterwards)
> The patch is rather trivial (a one-liner, actually) for 
> src/main/java/org/apache/mahout/cf/taste/impl/recommender/GenericItemBasedRecommender.java
> Note: the same error & suggested fix happens in GenericUserBasedRecommender

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAHOUT-898) Error in formula for preference estimation in GenericItemBasedRecommender

Reply via email to