Re: [jira] [Issue Comment Edited] (MAHOUT-752) Semantic Vectors: generate and use vectors from User/Item Taste data models

Lance Norskog Sat, 02 Jul 2011 01:19:02 -0700

For more info on what this project is about, these might help:

http://ultrawhizbang.blogspot.com/2010/11/semantic-vectors-part-1.html
http://ultrawhizbang.blogspot.com/2010/11/semantic-vectors-for-recommenders-with.html


On Sat, Jul 2, 2011 at 1:11 AM, Lance Norskog (JIRA) <[email protected]> wrote:
>
>    [ 
> https://issues.apache.org/jira/browse/MAHOUT-752?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13058994#comment-13058994
>  ]
>
> Lance Norskog edited comment on MAHOUT-752 at 7/2/11 8:09 AM:
> --------------------------------------------------------------
>
> Not commit quality, up for peer review. I've used this code for various 
> investigations. It is a handy tool.
>
> This lets you turn dual-universe data like User/Item ratings or Document/Term 
> collocations into simple geometric vectors that are amenable to Euclidean 
> distances. That is: given Users and Items with ratings between the two data 
> spaces, create a vector for every Item that is the sum of all interested 
> Users. And vice versa. Both sets of vectors are in parallel geometric 
> universes with the same scaling; vectors can be compared within one universe 
> and also spanning the parallel universe "wall". It's easiest to describe with 
> 3D models.
>
> Under the RecommenderEvaluator, a Recommender based on this can go 
> head-to-head with a simple KNN recommender and a simple SlopeOne recommender, 
> and the distances between the three are roughly triangular. That is, a 
> Semantic Vector-based Recommender is as trustworthy as the other two. 100k 
> GroupLens ratings was the lab for this comparison.
>
> This requires deterministic/reproduceable random vectors, which became 
> [MAHOUT-550|https://issues.apache.org/jira/browse/MAHOUT-550]. This 
> implementation makes and caches them, which does not scale.
>
> Tuning: the more dimensions, the better the "fidelity" to the ratings 
> dataset. It is clear that the vector sets have a lot of "air". With 
> RecommenderEvaluator, you can compare it to a few known recommenders and keep 
> adding dimensions until it gets in the ballpark. After that, you can compress 
> them down via Random Projection. (RP is in fact at the heart of the Semantic 
> Vectors algorithm.)
>
>      was (Author: lancenorskog):
>    Not commit quality, up for peer revie
>
>
>
>> Semantic Vectors: generate and use vectors from User/Item Taste data models
>> ----------------------------------------------------------------------------
>>
>>                 Key: MAHOUT-752
>>                 URL: https://issues.apache.org/jira/browse/MAHOUT-752
>>             Project: Mahout
>>          Issue Type: New Feature
>>          Components: Collaborative Filtering
>>            Reporter: Lance Norskog
>>            Assignee: Sean Owen
>>            Priority: Minor
>>         Attachments: SemanticVectors.patch
>>
>>
>> This package has two parts:
>> # SemanticVectorFactory creates geometric vectors based on non-geometric 
>> User/Item ratings.
>> # VectorDataModel stores these and does preference evaluation based on the 
>> vectors and a given DistanceMeasure
>> This is a clear explanation of the Semantic Vectors concept: 
>> [http://code.google.com/p/semanticvectors/]
>
> --
> This message is automatically generated by JIRA.
> For more information on JIRA, see: http://www.atlassian.com/software/jira
>
>
>



-- 
Lance Norskog
[email protected]

Re: [jira] [Issue Comment Edited] (MAHOUT-752) Semantic Vectors: generate and use vectors from User/Item Taste data models

Reply via email to