git commit: [MLlib][SPARK-2997] Update SVD documentation to reflect roughly square

meng Sun, 24 Aug 2014 17:36:44 -0700

Repository: spark
Updated Branches:
  refs/heads/branch-1.1 a4db81a55 -> 749bddc85



[MLlib][SPARK-2997] Update SVD documentation to reflect roughly square

Update the documentation to reflect the fact we can handle roughly square 
matrices.

Author: Reza Zadeh <[email protected]>

Closes #2070 from rezazadeh/svddocs and squashes the following commits:

826b8fe [Reza Zadeh] left singular vectors
3f34fc6 [Reza Zadeh] PCA is still TS
7ffa2aa [Reza Zadeh] better title
aeaf39d [Reza Zadeh] More docs
788ed13 [Reza Zadeh] add computational cost explanation
6429c59 [Reza Zadeh] Add link to rowmatrix docs
1eeab8b [Reza Zadeh] Update SVD documentation to reflect roughly square

(cherry picked from commit b1b20301b3a1b35564d61e58eb5964d5ad5e4d7d)
Signed-off-by: Xiangrui Meng <[email protected]>


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/749bddc8
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/749bddc8
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/749bddc8

Branch: refs/heads/branch-1.1
Commit: 749bddc85e76e0d1ded8d79058819335bd580741
Parents: a4db81a
Author: Reza Zadeh <[email protected]>
Authored: Sun Aug 24 17:35:54 2014 -0700
Committer: Xiangrui Meng <[email protected]>
Committed: Sun Aug 24 17:36:06 2014 -0700

----------------------------------------------------------------------
 docs/mllib-dimensionality-reduction.md | 29 +++++++++++++++++++++++------
 1 file changed, 23 insertions(+), 6 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/spark/blob/749bddc8/docs/mllib-dimensionality-reduction.md
----------------------------------------------------------------------
diff --git a/docs/mllib-dimensionality-reduction.md 
b/docs/mllib-dimensionality-reduction.md
index 065d646..9f2cf6d 100644
--- a/docs/mllib-dimensionality-reduction.md
+++ b/docs/mllib-dimensionality-reduction.md
@@ -11,7 +11,7 @@ displayTitle: <a href="mllib-guide.html">MLlib</a> - 
Dimensionality Reduction
 of reducing the number of variables under consideration.
 It can be used to extract latent features from raw and noisy features
 or compress data while maintaining the structure.
-MLlib provides support for dimensionality reduction on tall-and-skinny 
matrices.
+MLlib provides support for dimensionality reduction on the <a 
href="mllib-basics.html#rowmatrix">RowMatrix</a> class.
 
 ## Singular value decomposition (SVD)
 
@@ -39,8 +39,26 @@ If we keep the top $k$ singular values, then the dimensions 
of the resulting low
 * `$\Sigma$`: `$k \times k$`,
 * `$V$`: `$n \times k$`.
  
-MLlib provides SVD functionality to row-oriented matrices that have only a few 
columns,
-say, less than $1000$, but many rows, i.e., *tall-and-skinny* matrices.
+### Performance
+We assume $n$ is smaller than $m$. The singular values and the right singular 
vectors are derived
+from the eigenvalues and the eigenvectors of the Gramian matrix $A^T A$. The 
matrix
+storing the left singular vectors $U$, is computed via matrix multiplication as
+$U = A (V S^{-1})$, if requested by the user via the computeU parameter. 
+The actual method to use is determined automatically based on the 
computational cost:
+
+* If $n$ is small ($n < 100$) or $k$ is large compared with $n$ ($k > n / 2$), 
we compute the Gramian matrix
+first and then compute its top eigenvalues and eigenvectors locally on the 
driver.
+This requires a single pass with $O(n^2)$ storage on each executor and on the 
driver, and
+$O(n^2 k)$ time on the driver.
+* Otherwise, we compute $(A^T A) v$ in a distributive way and send it to
+<a href="http://www.caam.rice.edu/software/ARPACK/";>ARPACK</a> to
+compute $(A^T A)$'s top eigenvalues and eigenvectors on the driver node. This 
requires $O(k)$
+passes, $O(n)$ storage on each executor, and $O(n k)$ storage on the driver.
+
+### SVD Example
+ 
+MLlib provides SVD functionality to row-oriented matrices, provided in the
+<a href="mllib-basics.html#rowmatrix">RowMatrix</a> class. 
 
 <div class="codetabs">
 <div data-lang="scala" markdown="1">
@@ -124,9 +142,8 @@ MLlib supports PCA for tall-and-skinny matrices stored in 
row-oriented format.
 <div class="codetabs">
 <div data-lang="scala" markdown="1">
 
-The following code demonstrates how to compute principal components on a 
tall-and-skinny `RowMatrix`
+The following code demonstrates how to compute principal components on a 
`RowMatrix`
 and use them to project the vectors into a low-dimensional space.
-The number of columns should be small, e.g, less than 1000.
 
 {% highlight scala %}
 import org.apache.spark.mllib.linalg.Matrix
@@ -144,7 +161,7 @@ val projected: RowMatrix = mat.multiply(pc)
 
 <div data-lang="java" markdown="1">
 
-The following code demonstrates how to compute principal components on a 
tall-and-skinny `RowMatrix`
+The following code demonstrates how to compute principal components on a 
`RowMatrix`
 and use them to project the vectors into a low-dimensional space.
 The number of columns should be small, e.g, less than 1000.
 


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

git commit: [MLlib][SPARK-2997] Update SVD documentation to reflect roughly square

Reply via email to