[ 
https://issues.apache.org/jira/browse/MAHOUT-752?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13058994#comment-13058994
 ] 

Lance Norskog edited comment on MAHOUT-752 at 7/2/11 8:09 AM:
--------------------------------------------------------------

Not commit quality, up for peer review. I've used this code for various 
investigations. It is a handy tool.

This lets you turn dual-universe data like User/Item ratings or Document/Term 
collocations into simple geometric vectors that are amenable to Euclidean 
distances. That is: given Users and Items with ratings between the two data 
spaces, create a vector for every Item that is the sum of all interested Users. 
And vice versa. Both sets of vectors are in parallel geometric universes with 
the same scaling; vectors can be compared within one universe and also spanning 
the parallel universe "wall". It's easiest to describe with 3D models.

Under the RecommenderEvaluator, a Recommender based on this can go head-to-head 
with a simple KNN recommender and a simple SlopeOne recommender, and the 
distances between the three are roughly triangular. That is, a Semantic 
Vector-based Recommender is as trustworthy as the other two. 100k GroupLens 
ratings was the lab for this comparison.

This requires deterministic/reproduceable random vectors, which became 
[MAHOUT-550|https://issues.apache.org/jira/browse/MAHOUT-550]. This 
implementation makes and caches them, which does not scale.

Tuning: the more dimensions, the better the "fidelity" to the ratings dataset. 
It is clear that the vector sets have a lot of "air". With 
RecommenderEvaluator, you can compare it to a few known recommenders and keep 
adding dimensions until it gets in the ballpark. After that, you can compress 
them down via Random Projection. (RP is in fact at the heart of the Semantic 
Vectors algorithm.)

      was (Author: lancenorskog):
    Not commit quality, up for peer revie


  
> Semantic Vectors: generate and use vectors from User/Item Taste data models 
> ----------------------------------------------------------------------------
>
>                 Key: MAHOUT-752
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-752
>             Project: Mahout
>          Issue Type: New Feature
>          Components: Collaborative Filtering
>            Reporter: Lance Norskog
>            Assignee: Sean Owen
>            Priority: Minor
>         Attachments: SemanticVectors.patch
>
>
> This package has two parts:
> # SemanticVectorFactory creates geometric vectors based on non-geometric 
> User/Item ratings.
> # VectorDataModel stores these and does preference evaluation based on the 
> vectors and a given DistanceMeasure
> This is a clear explanation of the Semantic Vectors concept: 
> [http://code.google.com/p/semanticvectors/]

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to