[ 
https://issues.apache.org/jira/browse/MAHOUT-387?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated MAHOUT-387:
-----------------------------

           Status: Resolved  (was: Patch Available)
         Assignee: Sean Owen
    Fix Version/s: 0.3
       Resolution: Won't Fix

Yes like Jeff said, this actually exists as PearsonCorrelationSimilarity. In 
the case where the mean of each series is 0, the result is the same. The 
fastest way I know to see this is to just look at this form of the sample 
correlation: 
http://upload.wikimedia.org/math/c/a/6/ca68fbe94060a2591924b380c9bc4e27.png ... 
and note that sum(xi) = sum (yi) = 0 when the mean of xi and yi are 0. You're 
left with sum(xi*yi) in the numerator, which is the dot product, and 
sqrt(sum(xi^2)) * sqrt(sum(yi^2)) in the denominator, which are the vector 
sizes. This is just the cosine of the angle between x and y.

One can argue whether forcing the data to be centered is right. I think it's a 
good thing in all cases. It adjusts for a user's tendency to rate high or low 
on average. It also makes the computation simpler, and more consistent with 
Pearson (well, it makes it identical!). This has a good treatment:
http://en.wikipedia.org/wiki/Pearson_product-moment_correlation_coefficient#Geometric_interpretation

Only for this reason I'd mark this as won't-fix for the moment; the patch is 
otherwise nice. I'd personally like to hear more about why to not center if 
there's an argument for it.

> Cosine item similarity implementation
> -------------------------------------
>
>                 Key: MAHOUT-387
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-387
>             Project: Mahout
>          Issue Type: New Feature
>          Components: Collaborative Filtering
>            Reporter: Sebastian Schelter
>            Assignee: Sean Owen
>             Fix For: 0.3
>
>         Attachments: MAHOUT-387.patch
>
>
> I needed to compute the cosine similarity between two items when running 
> org.apache.mahout.cf.taste.hadoop.pseudo.RecommenderJob, I couldn't find an 
> implementation (did I overlook it maybe?) so I created my own. I want to 
> share it here, in case you find it useful.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to