Space: Apache Mahout (https://cwiki.apache.org/confluence/display/MAHOUT)
Page: SVD - Singular Value Decomposition
(https://cwiki.apache.org/confluence/display/MAHOUT/SVD+-+Singular+Value+Decomposition)
Change Comment:
---------------------------------------------------------------------
linked mail thread and quora discussion
Edited by Dan Brickley:
---------------------------------------------------------------------
{excerpt}Singular Value Decomposition is a form of product decomposition of a
matrix in which a rectangular matrix A is decomposed into a product U s V'
where U and V are orthonormal and s is a diagonal matrix.{excerpt} The values
of A can be real or complex, but the real case dominates applications in
machine learning. The most prominent properties of the SVD are:
* The decomposition of any real matrix has only real values
* The SVD is unique except for column permutations of U, s and V
* If you take only the largest n values of s and set the rest to zero, you
have a least squares approximation of A with rank n. This allows SVD to be
used very effectively in least squares regression and makes partial SVD useful.
* The SVD can be computed accurately for singular or nearly singular
matrices. For a matrix of rank n, only the first n singular values will be
non-zero. This allows SVD to be used for solution of singular linear systems.
The columns of U and V corresponding to zero singular values define the null
space of A.
* The partial SVD of very large matrices can be computed very quickly using
stochastic decompositions. See http://arxiv.org/abs/0909.4061v1 for details.
Gradient descent can also be used to compute partial SVD's and is very useful
where some values of the matrix being decomposed are not known.
In collaborative filtering and text retrieval, it is common to compute the
partial decomposition of the user x item interaction matrix or the document x
term matrix. This allows the projection of users and items (or documents and
terms) into a common vector space representation that is often referred to as
the latent semantic representation. This process is sometimes called Latent
Semantic Analysis and has been very effective in the analysis of the Netflix
dataset.
Dimension Reduction in Mahout:
* https://cwiki.apache.org/MAHOUT/dimensional-reduction.html
See Also:
* http://www.kwon3d.com/theory/jkinem/svd.html
* http://en.wikipedia.org/wiki/Singular_value_decomposition
* http://en.wikipedia.org/wiki/Latent_semantic_analysis
* http://en.wikipedia.org/wiki/Netflix_Prize
*
http://www.amazon.com/Understanding-Complex-Datasets-Decompositions-Knowledge/dp/1584888326
* http://web.mit.edu/be.400/www/SVD/Singular_Value_Decomposition.htm
*
http://www.quora.com/What-s-the-best-parallelized-sparse-SVD-code-publicly-available
*
[http://mail-archives.apache.org/mod_mbox/mahout-user/201102.mbox/%[email protected]%3E|understanding
Mahout Hadoop SVD thread]
Change your notification preferences:
https://cwiki.apache.org/confluence/users/viewnotifications.action