[
https://issues.apache.org/jira/browse/MAHOUT-752?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13058994#comment-13058994
]
Lance Norskog edited comment on MAHOUT-752 at 7/2/11 8:09 AM:
--------------------------------------------------------------
Not commit quality, up for peer review. I've used this code for various
investigations. It is a handy tool.
This lets you turn dual-universe data like User/Item ratings or Document/Term
collocations into simple geometric vectors that are amenable to Euclidean
distances. That is: given Users and Items with ratings between the two data
spaces, create a vector for every Item that is the sum of all interested Users.
And vice versa. Both sets of vectors are in parallel geometric universes with
the same scaling; vectors can be compared within one universe and also spanning
the parallel universe "wall". It's easiest to describe with 3D models.
Under the RecommenderEvaluator, a Recommender based on this can go head-to-head
with a simple KNN recommender and a simple SlopeOne recommender, and the
distances between the three are roughly triangular. That is, a Semantic
Vector-based Recommender is as trustworthy as the other two. 100k GroupLens
ratings was the lab for this comparison.
This requires deterministic/reproduceable random vectors, which became
[MAHOUT-550|https://issues.apache.org/jira/browse/MAHOUT-550]. This
implementation makes and caches them, which does not scale.
Tuning: the more dimensions, the better the "fidelity" to the ratings dataset.
It is clear that the vector sets have a lot of "air". With
RecommenderEvaluator, you can compare it to a few known recommenders and keep
adding dimensions until it gets in the ballpark. After that, you can compress
them down via Random Projection. (RP is in fact at the heart of the Semantic
Vectors algorithm.)
was (Author: lancenorskog):
Not commit quality, up for peer revie
> Semantic Vectors: generate and use vectors from User/Item Taste data models
> ----------------------------------------------------------------------------
>
> Key: MAHOUT-752
> URL: https://issues.apache.org/jira/browse/MAHOUT-752
> Project: Mahout
> Issue Type: New Feature
> Components: Collaborative Filtering
> Reporter: Lance Norskog
> Assignee: Sean Owen
> Priority: Minor
> Attachments: SemanticVectors.patch
>
>
> This package has two parts:
> # SemanticVectorFactory creates geometric vectors based on non-geometric
> User/Item ratings.
> # VectorDataModel stores these and does preference evaluation based on the
> vectors and a given DistanceMeasure
> This is a clear explanation of the Semantic Vectors concept:
> [http://code.google.com/p/semanticvectors/]
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira