For more info on what this project is about, these might help: http://ultrawhizbang.blogspot.com/2010/11/semantic-vectors-part-1.html http://ultrawhizbang.blogspot.com/2010/11/semantic-vectors-for-recommenders-with.html
On Sat, Jul 2, 2011 at 1:11 AM, Lance Norskog (JIRA) <[email protected]> wrote: > > [ > https://issues.apache.org/jira/browse/MAHOUT-752?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13058994#comment-13058994 > ] > > Lance Norskog edited comment on MAHOUT-752 at 7/2/11 8:09 AM: > -------------------------------------------------------------- > > Not commit quality, up for peer review. I've used this code for various > investigations. It is a handy tool. > > This lets you turn dual-universe data like User/Item ratings or Document/Term > collocations into simple geometric vectors that are amenable to Euclidean > distances. That is: given Users and Items with ratings between the two data > spaces, create a vector for every Item that is the sum of all interested > Users. And vice versa. Both sets of vectors are in parallel geometric > universes with the same scaling; vectors can be compared within one universe > and also spanning the parallel universe "wall". It's easiest to describe with > 3D models. > > Under the RecommenderEvaluator, a Recommender based on this can go > head-to-head with a simple KNN recommender and a simple SlopeOne recommender, > and the distances between the three are roughly triangular. That is, a > Semantic Vector-based Recommender is as trustworthy as the other two. 100k > GroupLens ratings was the lab for this comparison. > > This requires deterministic/reproduceable random vectors, which became > [MAHOUT-550|https://issues.apache.org/jira/browse/MAHOUT-550]. This > implementation makes and caches them, which does not scale. > > Tuning: the more dimensions, the better the "fidelity" to the ratings > dataset. It is clear that the vector sets have a lot of "air". With > RecommenderEvaluator, you can compare it to a few known recommenders and keep > adding dimensions until it gets in the ballpark. After that, you can compress > them down via Random Projection. (RP is in fact at the heart of the Semantic > Vectors algorithm.) > > was (Author: lancenorskog): > Not commit quality, up for peer revie > > > >> Semantic Vectors: generate and use vectors from User/Item Taste data models >> ---------------------------------------------------------------------------- >> >> Key: MAHOUT-752 >> URL: https://issues.apache.org/jira/browse/MAHOUT-752 >> Project: Mahout >> Issue Type: New Feature >> Components: Collaborative Filtering >> Reporter: Lance Norskog >> Assignee: Sean Owen >> Priority: Minor >> Attachments: SemanticVectors.patch >> >> >> This package has two parts: >> # SemanticVectorFactory creates geometric vectors based on non-geometric >> User/Item ratings. >> # VectorDataModel stores these and does preference evaluation based on the >> vectors and a given DistanceMeasure >> This is a clear explanation of the Semantic Vectors concept: >> [http://code.google.com/p/semanticvectors/] > > -- > This message is automatically generated by JIRA. > For more information on JIRA, see: http://www.atlassian.com/software/jira > > > -- Lance Norskog [email protected]
