[
https://issues.apache.org/jira/browse/MAHOUT-389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12864334#action_12864334
]
Sean Owen commented on MAHOUT-389:
----------------------------------
De-emphasizing common items is often desirable, though achieving it this way
may be a case of mixing two things that can be separable. That is you can more
directly de-emphasize with, for example, the provided InverseUserFrequency
transformation.
But doubtless there are times when this version of the computation is better,
even best. For example when your rating is really the number of times a user
has viewed an item, then it's absolutely correct to assume missing items are
rated 0, because they really are 0, and so adding that information is even
better. You could also get this effect with a PreferenceInferrer. (See I've
thought of everything.)
But I don't really mind adding more variations on the similarity computation,
to give people options. As long as it fits in nicely, doesn't really complicate
the code or confuse readers, and doesn't have any material performance impact,
why not? I will look at the patch a bit later and see what you're up to.
> UncenteredCosineSimilarity
> ---------------------------
>
> Key: MAHOUT-389
> URL: https://issues.apache.org/jira/browse/MAHOUT-389
> Project: Mahout
> Issue Type: Improvement
> Components: Collaborative Filtering
> Reporter: Sebastian Schelter
> Priority: Minor
> Attachments: MAHOUT-389-2.patch, MAHOUT-389-3.patch, MAHOUT-389.patch
>
>
> org.apache.mahout.cf.taste.impl.similarity.UncenteredCosineSimilarity only
> computes the cosine distance between those components of the vectors where
> both vectors have a value greater zero.
> This is inconsistent with the definition of the cosine (correct me if I'm
> wrong) and is inconsistent with the distributed cosine similarity computation.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.