[
https://issues.apache.org/jira/browse/MAHOUT-752?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13066571#comment-13066571
]
Lance Norskog edited comment on MAHOUT-752 at 7/17/11 3:56 AM:
---------------------------------------------------------------
bq. The idea seems to duplicate, in different code, the existing in-memory data
model with similarity metrics.
The MemoryDiffStorage class, you mean? SV is more fluid: it can do user/item
and item/item.
Semantic vectors makes a standard data format, and so is usable in different
ways.
These, downsized with random projection, give the same information in a lot
less memory.
The generated vectors have a "geometric" nature (sort of), and so have a couple
of interesting properties:
* They cluster well with Euclidean distance.
* Random Projection downsizes a 200-dimensional matrix to 2d, and the resulting
chart actually makes sense.
Also, since they are generated by summing random numbers, the vectors lean
towards Gaussian distributions, no matter what the input set.
It is also really effective in map/reduce chains, because the generated vectors
are specified by the input user/item matrix and a formula, and the computation
can be put off (or never done).
You're right about the forum. Recommenders seemed a good way to use this, but
it also works with text collocation.
was (Author: lancenorskog):
bq. The idea seems to duplicate, in different code, the existing in-memory
data model with similarity metrics.
The MemoryDiffStorage class, you mean? SV is more fluid: it can do user/item
and item/item.
Semantic vectors makes a standard data format, and so is usable in different
ways.
These, downsized with random projection, give the same information in a lot
less memory.
The generated vectors have a "geometric" nature (sort of), and so have a couple
of interesting properties:
* They cluster well with Euclidean distance.
* Random Projection downsizes a 200-dimensional matrix to 2d, and the resulting
chart actually makes sense.
Also, since they are generated by summing random numbers, the vectors lean
towards Gaussian distributions, no matter what the input set.
It is also really effective in map/reduce chains, because the generated vectors
are specified by the input matrix and a formula, and the computation can be put
off (or never done).
You're right about the forum. Recommenders seemed a good way to use this, but
it also works with text collocation.
> Semantic Vectors: generate and use vectors from User/Item Taste data models
> ----------------------------------------------------------------------------
>
> Key: MAHOUT-752
> URL: https://issues.apache.org/jira/browse/MAHOUT-752
> Project: Mahout
> Issue Type: New Feature
> Components: Collaborative Filtering
> Reporter: Lance Norskog
> Assignee: Sean Owen
> Priority: Minor
> Attachments: SemanticVectors.patch
>
>
> This package has two parts:
> # SemanticVectorFactory creates geometric vectors based on non-geometric
> User/Item ratings.
> # VectorDataModel stores these and does preference evaluation based on the
> vectors and a given DistanceMeasure
> This is a large exploration of the Semantic Vectors concept:
> [http://code.google.com/p/semanticvectors/]. And was the inspiration for this
> project.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira