Author: buildbot
Date: Fri Feb 3 23:13:23 2017
New Revision: 1006165
Log:
Staging update by buildbot for mahout
Modified:
websites/staging/mahout/trunk/content/ (props changed)
websites/staging/mahout/trunk/content/users/algorithms/d-spca.html
Propchange: websites/staging/mahout/trunk/content/
------------------------------------------------------------------------------
--- cms:source-revision (original)
+++ cms:source-revision Fri Feb 3 23:13:23 2017
@@ -1 +1 @@
-1781616
+1781617
Modified: websites/staging/mahout/trunk/content/users/algorithms/d-spca.html
==============================================================================
--- websites/staging/mahout/trunk/content/users/algorithms/d-spca.html
(original)
+++ websites/staging/mahout/trunk/content/users/algorithms/d-spca.html Fri Feb
3 23:13:23 2017
@@ -283,27 +283,31 @@ h2:hover > .headerlink, h3:hover > .head
<h2 id="intro">Intro<a class="headerlink" href="#intro" title="Permanent
link">¶</a></h2>
<p>Mahout has a distributed implementation of Stochastic PCA[1].</p>
<h2 id="algorithm">Algorithm<a class="headerlink" href="#algorithm"
title="Permanent link">¶</a></h2>
-<p>Given an <em>m</em> <code>\(\times\)</code> <em>n</em> matrix
<code>\(\mathbf{A}\)</code>, a target rank <em>k</em>, and an oversampling
parameter <em>p</em>, this procedure computes a <em>k</em>-rank PCA by finding
the unknowns in <code>\(\mathbf{Aâ1\mu^\top \ge U\Sigma V}\)</code>:
-1. Create seed for random <em>n</em> <code>\(\times\)</code> <em>(k+p)</em>
matrix <code>\(\Omega\)</code>.
-2. <code>\(s_\Omega \leftarrow \Omega^\top \mu\)</code>.
-3. <code>\(\mathbf{Y_0} \leftarrow A\Omega â 1(s_\Omega)^\top, Y \in
\mathbb{R}^(m\times(k+p))\)</code>.
-4. Column-orthonormalize <code>\(\mathbf{Y_0} \rightarrow \mathbf{Q}\)</code>
by computing thin decomposition <code>\(\mathbf{Y_0} = \mathbf{QR}\)</code>.
Also, <code>\(\mathbf{Q}\in\mathbb{R}^(m\times(k+p)),
\mathbf{R}\in\mathbb{R}^((k+p)\times(k+p))\)</code>.
-5. <code>\(s_Q \leftarrow Q^\top 1\)</code>.
-6. <code>\(\mathbf{B_0} \leftarrow Q^\top A: B \in \mathbb{R}^((k+p)\times
n)\)</code>.
-7. <code>\(s_B \leftarrow (B_0)^\top \mu\)</code>.
-8. For <em>i</em> in 1..<em>q</em> repeat (power iterations):
- - For <em>j</em> in 1..<em>n</em> <code>\(apply(B_(iâ1))_(âj)
\leftarrow (B_(iâ1))_(âj)â\mu_j s_Q\)</code>.
- - <code>\(\mathbf{Y_i) \leftarrow
\mathbf{(AB_(iâ1)^\top)â1(s_Bâ\mu^\top \mu s_Q^\top)}\)</code>.
- - Column-orthonormalize <code>\(\mathbf{Y_i} \rightarrow
\mathbf{Q}\)</code> by computing thin decomposition <code>\(\mathbf{Y_i =
QR}\)</code>.
- - <code>\(\mathbf{s_Q \leftarrow Q^\top 1}\)</code>.
- - <code>\(\mathbf{B_i \leftarrow Q^\top A}\)</code>.
- - <code>\(\mathbf{s_B \leftarrow (B_i)^\top \mu}\)</code>.
-9. Let <code>\(\mathbf{C \triangleq s_Q (s_B)^\top}\)</code>.
<code>\(\mathbf{M \leftarrow B_q (B_q)^\top â C â C^\top + \mu^\top \mu s_Q
(s_Q)^\top}\)</code>.
-10. Compute an eigensolution of the small symmetric <code>\(\mathbf{M =
\hat{U} \Lambda \hat{U}^\top: M \in \mathbb{R}^((k+p)\times(k+p))}\)</code>.
-11. The singular values <code>\(\Sigma = \Lambda^(\circ 0.5)\)</code>, or, in
other words, <code>\(\mathbf{\sigma_i= \sqrt{\lambda_i}}\)</code>.
-12. If needed, compute <code>\(\mathbf{U = Q\hat{U}}\)</code>.
-13. If needed, compute <code>\(\mathbf{V = B^\top \hat{U}
\Sigma^(â1)}\)</code>. Another way is <code>\(\mathbf{V = A^\top
U\Sigma^(â1)}\)1.
-14. If needed, items converted to the PCA space can be computed
as</code>(\mathbf{U\Sigma})`.</p>
+<p>Given an <em>m</em> <code>\(\times\)</code> <em>n</em> matrix
<code>\(\mathbf{A}\)</code>, a target rank <em>k</em>, and an oversampling
parameter <em>p</em>, this procedure computes a <em>k</em>-rank PCA by finding
the unknowns in <code>\(\mathbf{Aâ1\mu^\top \ge U\Sigma V}\)</code>:</p>
+<ol>
+<li>Create seed for random <em>n</em> <code>\(\times\)</code> <em>(k+p)</em>
matrix <code>\(\Omega\)</code>.</li>
+<li><code>\(s_\Omega \leftarrow \Omega^\top \mu\)</code>.</li>
+<li><code>\(\mathbf{Y_0} \leftarrow A\Omega â 1(s_\Omega)^\top, Y \in
\mathbb{R}^(m\times(k+p))\)</code>.</li>
+<li>Column-orthonormalize <code>\(\mathbf{Y_0} \rightarrow \mathbf{Q}\)</code>
by computing thin decomposition <code>\(\mathbf{Y_0} = \mathbf{QR}\)</code>.
Also, <code>\(\mathbf{Q}\in\mathbb{R}^(m\times(k+p)),
\mathbf{R}\in\mathbb{R}^((k+p)\times(k+p))\)</code>.</li>
+<li><code>\(s_Q \leftarrow Q^\top 1\)</code>.</li>
+<li><code>\(\mathbf{B_0} \leftarrow Q^\top A: B \in \mathbb{R}^((k+p)\times
n)\)</code>.</li>
+<li><code>\(s_B \leftarrow (B_0)^\top \mu\)</code>.</li>
+<li>For <em>i</em> in 1..<em>q</em> repeat (power iterations):<ul>
+<li>For <em>j</em> in 1..<em>n</em> <code>\(apply(B_(iâ1))_(âj) \leftarrow
(B_(iâ1))_(âj)â\mu_j s_Q\)</code>.</li>
+<li><code>\(\mathbf{Y_i) \leftarrow
\mathbf{(AB_(iâ1)^\top)â1(s_Bâ\mu^\top \mu s_Q^\top)}\)</code>.</li>
+<li>Column-orthonormalize <code>\(\mathbf{Y_i} \rightarrow \mathbf{Q}\)</code>
by computing thin decomposition <code>\(\mathbf{Y_i = QR}\)</code>.</li>
+<li><code>\(\mathbf{s_Q \leftarrow Q^\top 1}\)</code>.</li>
+<li><code>\(\mathbf{B_i \leftarrow Q^\top A}\)</code>.</li>
+<li><code>\(\mathbf{s_B \leftarrow (B_i)^\top \mu}\)</code>.</li>
+</ul>
+</li>
+<li>Let <code>\(\mathbf{C \triangleq s_Q (s_B)^\top}\)</code>.
<code>\(\mathbf{M \leftarrow B_q (B_q)^\top â C â C^\top + \mu^\top \mu s_Q
(s_Q)^\top}\)</code>.</li>
+<li>Compute an eigensolution of the small symmetric <code>\(\mathbf{M =
\hat{U} \Lambda \hat{U}^\top: M \in
\mathbb{R}^((k+p)\times(k+p))}\)</code>.</li>
+<li>The singular values <code>\(\Sigma = \Lambda^(\circ 0.5)\)</code>, or, in
other words, <code>\(\mathbf{\sigma_i= \sqrt{\lambda_i}}\)</code>.</li>
+<li>If needed, compute <code>\(\mathbf{U = Q\hat{U}}\)</code>.</li>
+<li>If needed, compute <code>\(\mathbf{V = B^\top \hat{U}
\Sigma^(â1)}\)</code>. Another way is `(\mathbf{V = A^\top
U\Sigma^(â1)})1.</li>
+<li>If needed, items converted to the PCA space can be computed as
<code>\(\mathbf{U\Sigma}\)</code>.</li>
+</ol>
<h2 id="implementation">Implementation<a class="headerlink"
href="#implementation" title="Permanent link">¶</a></h2>
<p>Mahout <code>dspca(...)</code> is implemented in the mahout
<code>math-scala</code> algebraic optimizer which translates Mahout's R-like
linear algebra operators into a physical plan for both Spark and H2O
distributed engines.</p>
<p>def dspca<a href="drmA: DrmLike[K], k: Int, p: Int = 15, q: Int = 0">K</a>: