[ 
https://issues.apache.org/jira/browse/MAHOUT-752?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13066571#comment-13066571
 ] 

Lance Norskog edited comment on MAHOUT-752 at 7/17/11 3:56 AM:
---------------------------------------------------------------

bq. The idea seems to duplicate, in different code, the existing in-memory data 
model with similarity metrics. 
The MemoryDiffStorage class, you mean? SV is more fluid: it can do user/item 
and item/item.
Semantic vectors makes a standard data format, and so is usable in different 
ways. 
These, downsized with random projection, give the same information in a lot 
less memory.
The generated vectors have a "geometric" nature (sort of), and so have a couple 
of interesting properties:
* They cluster well with Euclidean distance.
* Random Projection downsizes a 200-dimensional matrix to 2d, and the resulting 
chart actually makes sense.
 
Also, since they are generated by summing random numbers, the vectors lean 
towards Gaussian distributions, no matter what the input set.

It is also really effective in map/reduce chains, because the generated vectors 
are specified by the input user/item matrix and a formula, and the computation 
can be put off (or never done).

You're right about the forum. Recommenders seemed a good way to use this, but 
it also works with text collocation.



      was (Author: lancenorskog):
    bq. The idea seems to duplicate, in different code, the existing in-memory 
data model with similarity metrics. 
The MemoryDiffStorage class, you mean? SV is more fluid: it can do user/item 
and item/item.
Semantic vectors makes a standard data format, and so is usable in different 
ways. 
These, downsized with random projection, give the same information in a lot 
less memory.
The generated vectors have a "geometric" nature (sort of), and so have a couple 
of interesting properties:
* They cluster well with Euclidean distance.
* Random Projection downsizes a 200-dimensional matrix to 2d, and the resulting 
chart actually makes sense. 
Also, since they are generated by summing random numbers, the vectors lean 
towards Gaussian distributions, no matter what the input set.
It is also really effective in map/reduce chains, because the generated vectors 
are specified by the input matrix and a formula, and the computation can be put 
off (or never done).
You're right about the forum. Recommenders seemed a good way to use this, but 
it also works with text collocation.


  
> Semantic Vectors: generate and use vectors from User/Item Taste data models 
> ----------------------------------------------------------------------------
>
>                 Key: MAHOUT-752
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-752
>             Project: Mahout
>          Issue Type: New Feature
>          Components: Collaborative Filtering
>            Reporter: Lance Norskog
>            Assignee: Sean Owen
>            Priority: Minor
>         Attachments: SemanticVectors.patch
>
>
> This package has two parts:
> # SemanticVectorFactory creates geometric vectors based on non-geometric 
> User/Item ratings.
> # VectorDataModel stores these and does preference evaluation based on the 
> vectors and a given DistanceMeasure
> This is a large exploration of the Semantic Vectors concept: 
> [http://code.google.com/p/semanticvectors/]. And was the inspiration for this 
> project.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to