Repository: spark
Updated Branches:
  refs/heads/master 85e9d091d -> a8eb92dcb


[SPARK-5507] Added documentation for BlockMatrix

Docs for BlockMatrix. mengxr

Author: Burak Yavuz <brk...@gmail.com>

Closes #4664 from brkyvz/SPARK-5507PR and squashes the following commits:

4db30b0 [Burak Yavuz] [SPARK-5507] Added documentation for BlockMatrix


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/a8eb92dc
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/a8eb92dc
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/a8eb92dc

Branch: refs/heads/master
Commit: a8eb92dcb9ab1e6d8a34eed9a8fddeda645b5094
Parents: 85e9d09
Author: Burak Yavuz <brk...@gmail.com>
Authored: Wed Feb 18 10:11:08 2015 -0800
Committer: Xiangrui Meng <m...@databricks.com>
Committed: Wed Feb 18 10:11:08 2015 -0800

----------------------------------------------------------------------
 docs/mllib-data-types.md | 75 +++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 75 insertions(+)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/spark/blob/a8eb92dc/docs/mllib-data-types.md
----------------------------------------------------------------------
diff --git a/docs/mllib-data-types.md b/docs/mllib-data-types.md
index 101dc2f..24d22b9 100644
--- a/docs/mllib-data-types.md
+++ b/docs/mllib-data-types.md
@@ -296,6 +296,81 @@ backed by an RDD of its entries.
 The underlying RDDs of a distributed matrix must be deterministic, because we 
cache the matrix size.
 In general the use of non-deterministic RDDs can lead to errors.
 
+### BlockMatrix
+
+A `BlockMatrix` is a distributed matrix backed by an RDD of `MatrixBlock`s, 
where `MatrixBlock` is
+a tuple of `((Int, Int), Matrix)`, where the `(Int, Int)` is the index of the 
block, and `Matrix` is
+the sub-matrix at the given index with size `rowsPerBlock` x `colsPerBlock`.
+`BlockMatrix` supports methods such as `.add` and `.multiply` with another 
`BlockMatrix`.
+`BlockMatrix` also has a helper function `.validate` which can be used to 
debug whether the
+`BlockMatrix` is set up properly.
+
+<div class="codetabs">
+<div data-lang="scala" markdown="1">
+
+A 
[`BlockMatrix`](api/scala/index.html#org.apache.spark.mllib.linalg.distributed.BlockMatrix)
 can be
+most easily created from an `IndexedRowMatrix` or `CoordinateMatrix` using 
`.toBlockMatrix()`.
+`.toBlockMatrix()` will create blocks of size 1024 x 1024. Users may change 
the sizes of their blocks
+by supplying the values through `.toBlockMatrix(rowsPerBlock, colsPerBlock)`.
+
+{% highlight scala %}
+import org.apache.spark.mllib.linalg.SingularValueDecomposition
+import org.apache.spark.mllib.linalg.distributed.{BlockMatrix, 
CoordinateMatrix, MatrixEntry}
+
+val entries: RDD[MatrixEntry] = ... // an RDD of (i, j, v) matrix entries
+// Create a CoordinateMatrix from an RDD[MatrixEntry].
+val coordMat: CoordinateMatrix = new CoordinateMatrix(entries)
+// Transform the CoordinateMatrix to a BlockMatrix
+val matA: BlockMatrix = coordMat.toBlockMatrix().cache()
+
+// validate whether the BlockMatrix is set up properly. Throws an Exception 
when it is not valid.
+// Nothing happens if it is valid.
+matA.validate
+
+// Calculate A^T A.
+val AtransposeA = matA.transpose.multiply(matA)
+
+// get SVD of 2 * A
+val A2 = matA.add(matA)
+val svd = A2.toIndexedRowMatrix().computeSVD(20, false, 1e-9)
+{% endhighlight %}
+</div>
+
+<div data-lang="java" markdown="1">
+
+A 
[`BlockMatrix`](api/scala/index.html#org.apache.spark.mllib.linalg.distributed.BlockMatrix)
 can be
+most easily created from an `IndexedRowMatrix` or `CoordinateMatrix` using 
`.toBlockMatrix()`.
+`.toBlockMatrix()` will create blocks of size 1024 x 1024. Users may change 
the sizes of their blocks
+by supplying the values through `.toBlockMatrix(rowsPerBlock, colsPerBlock)`.
+
+{% highlight java %}
+import org.apache.spark.api.java.JavaRDD;
+import org.apache.spark.mllib.linalg.SingularValueDecomposition;
+import org.apache.spark.mllib.linalg.distributed.BlockMatrix;
+import org.apache.spark.mllib.linalg.distributed.CoordinateMatrix;
+import org.apache.spark.mllib.linalg.distributed.IndexedRowMatrix;
+
+JavaRDD<MatrixEntry> entries = ... // a JavaRDD of (i, j, v) Matrix Entries
+// Create a CoordinateMatrix from a JavaRDD<MatrixEntry>.
+CoordinateMatrix coordMat = new CoordinateMatrix(entries.rdd());
+// Transform the CoordinateMatrix to a BlockMatrix
+BlockMatrix matA = coordMat.toBlockMatrix().cache();
+
+// validate whether the BlockMatrix is set up properly. Throws an Exception 
when it is not valid.
+// Nothing happens if it is valid.
+matA.validate();
+
+// Calculate A^T A.
+BlockMatrix AtransposeA = matA.transpose().multiply(matA);
+
+// get SVD of 2 * A
+BlockMatrix A2 = matA.add(matA);
+SingularValueDecomposition<IndexedRowMatrix, Matrix> svd =
+  A2.toIndexedRowMatrix().computeSVD(20, false, 1e-9);
+{% endhighlight %}
+</div>
+</div>
+
 ### RowMatrix
 
 A `RowMatrix` is a row-oriented distributed matrix without meaningful row 
indices, backed by an RDD


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

Reply via email to