Author: akm
Date: Thu Feb 2 23:36:41 2017
New Revision: 1781487
URL: http://svn.apache.org/viewvc?rev=1781487&view=rev
Log:
MAHOUT-1682 and 1686: SPCA and ALS pages.
Added:
mahout/site/mahout_cms/trunk/content/users/algorithms/d-als.mdtext
- copied unchanged from r1781457,
mahout/site/mahout_cms/trunk/content/users/algorithms/d-qr.mdtext
mahout/site/mahout_cms/trunk/content/users/algorithms/d-spca.mdtext
- copied, changed from r1781457,
mahout/site/mahout_cms/trunk/content/users/algorithms/d-qr.mdtext
Modified:
mahout/site/mahout_cms/trunk/content/users/algorithms/d-qr.mdtext
Modified: mahout/site/mahout_cms/trunk/content/users/algorithms/d-qr.mdtext
URL:
http://svn.apache.org/viewvc/mahout/site/mahout_cms/trunk/content/users/algorithms/d-qr.mdtext?rev=1781487&r1=1781486&r2=1781487&view=diff
==============================================================================
--- mahout/site/mahout_cms/trunk/content/users/algorithms/d-qr.mdtext (original)
+++ mahout/site/mahout_cms/trunk/content/users/algorithms/d-qr.mdtext Thu Feb
2 23:36:41 2017
@@ -3,11 +3,11 @@
## Intro
-Mahout has a distributed implementation of QR decomposition for tall thin
matricies[1].
+Mahout has a distributed implementation of QR decomposition for tall thin
matrices[1].
## Algorithm
-For the classic QR decomposition of the form
`\(\mathbf{A}=\mathbf{QR},\mathbf{A}\in\mathbb{R}^{m\times n}\)` a distributed
version is fairly easily achieved if `\(\mathbf{A}\)` is tall and thin such
that `\(\mathbf{A}^{\top}\mathbf{A}\)` fits in memory, i.e. *m* is large but
*n* < ~5000 Under such circumstances, only `\(\mathbf{A}\)` and
`\(\mathbf{Q}\)` are distributed matricies and `\(\mathbf{A^{\top}A}\)` and
`\(\mathbf{R}\)` are in-core products. We just compute the in-core version of
the Cholesky decomposition in the form of `\(\mathbf{LL}^{\top}=
\mathbf{A}^{\top}\mathbf{A}\)`. After that we take `\(\mathbf{R}=
\mathbf{L}^{\top}\)` and
`\(\mathbf{Q}=\mathbf{A}\left(\mathbf{L}^{\top}\right)^{-1}\)`. The latter is
easily achieved by multiplying each verticle block of `\(\mathbf{A}\)` by
`\(\left(\mathbf{L}^{\top}\right)^{-1}\)`. (There is no actual matrix
inversion happening).
+For the classic QR decomposition of the form
`\(\mathbf{A}=\mathbf{QR},\mathbf{A}\in\mathbb{R}^{m\times n}\)` a distributed
version is fairly easily achieved if `\(\mathbf{A}\)` is tall and thin such
that `\(\mathbf{A}^{\top}\mathbf{A}\)` fits in memory, i.e. *m* is large but
*n* < ~5000 Under such circumstances, only `\(\mathbf{A}\)` and
`\(\mathbf{Q}\)` are distributed matrices and `\(\mathbf{A^{\top}A}\)` and
`\(\mathbf{R}\)` are in-core products. We just compute the in-core version of
the Cholesky decomposition in the form of `\(\mathbf{LL}^{\top}=
\mathbf{A}^{\top}\mathbf{A}\)`. After that we take `\(\mathbf{R}=
\mathbf{L}^{\top}\)` and
`\(\mathbf{Q}=\mathbf{A}\left(\mathbf{L}^{\top}\right)^{-1}\)`. The latter is
easily achieved by multiplying each vertical block of `\(\mathbf{A}\)` by
`\(\left(\mathbf{L}^{\top}\right)^{-1}\)`. (There is no actual matrix
inversion happening).
Copied: mahout/site/mahout_cms/trunk/content/users/algorithms/d-spca.mdtext
(from r1781457,
mahout/site/mahout_cms/trunk/content/users/algorithms/d-qr.mdtext)
URL:
http://svn.apache.org/viewvc/mahout/site/mahout_cms/trunk/content/users/algorithms/d-spca.mdtext?p2=mahout/site/mahout_cms/trunk/content/users/algorithms/d-spca.mdtext&p1=mahout/site/mahout_cms/trunk/content/users/algorithms/d-qr.mdtext&r1=1781457&r2=1781487&rev=1781487&view=diff
==============================================================================
--- mahout/site/mahout_cms/trunk/content/users/algorithms/d-qr.mdtext (original)
+++ mahout/site/mahout_cms/trunk/content/users/algorithms/d-spca.mdtext Thu Feb
2 23:36:41 2017
@@ -1,14 +1,13 @@
-# Distributed Cholesky QR
+# Distributed Stochastic PCA
## Intro
-Mahout has a distributed implementation of QR decomposition for tall thin
matricies[1].
+Mahout has a distributed implementation of Stochastic PCA
-## Algorithm
-
-For the classic QR decomposition of the form
`\(\mathbf{A}=\mathbf{QR},\mathbf{A}\in\mathbb{R}^{m\times n}\)` a distributed
version is fairly easily achieved if `\(\mathbf{A}\)` is tall and thin such
that `\(\mathbf{A}^{\top}\mathbf{A}\)` fits in memory, i.e. *m* is large but
*n* < ~5000 Under such circumstances, only `\(\mathbf{A}\)` and
`\(\mathbf{Q}\)` are distributed matricies and `\(\mathbf{A^{\top}A}\)` and
`\(\mathbf{R}\)` are in-core products. We just compute the in-core version of
the Cholesky decomposition in the form of `\(\mathbf{LL}^{\top}=
\mathbf{A}^{\top}\mathbf{A}\)`. After that we take `\(\mathbf{R}=
\mathbf{L}^{\top}\)` and
`\(\mathbf{Q}=\mathbf{A}\left(\mathbf{L}^{\top}\right)^{-1}\)`. The latter is
easily achieved by multiplying each verticle block of `\(\mathbf{A}\)` by
`\(\left(\mathbf{L}^{\top}\right)^{-1}\)`. (There is no actual matrix
inversion happening).
+## Motivation
+Stochastic SVD method in Mahout produces reduced-rank Singular Value
Decomposition output in its strict mathematical definition:
`\(\mathbf{A}\approx\mathbf{UΣV}\)`, i.e. it creates outputs for matrices
`\(\mathbf{U},\mathbf{V}, and \mathbf{Σ}\)`, each of which may be requested
individually. The desired rank of decomposition, henceforth denoted as
*k*`\(\in\mathbb{N}_1\)`, is a parameter of the algorithm. The singular values
inside diagonal matrix `\(\Sigma\)` satisfyÏi+1â¤Ïiâiâ[1,kâ1], i.e.
sorted from biggest tosmallest. Cases of rank deficiency rank(A)< karehandled
by producing 0s in singular value positionsonce deficiency takes place.
## Implementation