Repository: spark Updated Branches: refs/heads/master 85e9d091d -> a8eb92dcb
[SPARK-5507] Added documentation for BlockMatrix Docs for BlockMatrix. mengxr Author: Burak Yavuz <brk...@gmail.com> Closes #4664 from brkyvz/SPARK-5507PR and squashes the following commits: 4db30b0 [Burak Yavuz] [SPARK-5507] Added documentation for BlockMatrix Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/a8eb92dc Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/a8eb92dc Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/a8eb92dc Branch: refs/heads/master Commit: a8eb92dcb9ab1e6d8a34eed9a8fddeda645b5094 Parents: 85e9d09 Author: Burak Yavuz <brk...@gmail.com> Authored: Wed Feb 18 10:11:08 2015 -0800 Committer: Xiangrui Meng <m...@databricks.com> Committed: Wed Feb 18 10:11:08 2015 -0800 ---------------------------------------------------------------------- docs/mllib-data-types.md | 75 +++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 75 insertions(+) ---------------------------------------------------------------------- http://git-wip-us.apache.org/repos/asf/spark/blob/a8eb92dc/docs/mllib-data-types.md ---------------------------------------------------------------------- diff --git a/docs/mllib-data-types.md b/docs/mllib-data-types.md index 101dc2f..24d22b9 100644 --- a/docs/mllib-data-types.md +++ b/docs/mllib-data-types.md @@ -296,6 +296,81 @@ backed by an RDD of its entries. The underlying RDDs of a distributed matrix must be deterministic, because we cache the matrix size. In general the use of non-deterministic RDDs can lead to errors. +### BlockMatrix + +A `BlockMatrix` is a distributed matrix backed by an RDD of `MatrixBlock`s, where `MatrixBlock` is +a tuple of `((Int, Int), Matrix)`, where the `(Int, Int)` is the index of the block, and `Matrix` is +the sub-matrix at the given index with size `rowsPerBlock` x `colsPerBlock`. +`BlockMatrix` supports methods such as `.add` and `.multiply` with another `BlockMatrix`. +`BlockMatrix` also has a helper function `.validate` which can be used to debug whether the +`BlockMatrix` is set up properly. + +<div class="codetabs"> +<div data-lang="scala" markdown="1"> + +A [`BlockMatrix`](api/scala/index.html#org.apache.spark.mllib.linalg.distributed.BlockMatrix) can be +most easily created from an `IndexedRowMatrix` or `CoordinateMatrix` using `.toBlockMatrix()`. +`.toBlockMatrix()` will create blocks of size 1024 x 1024. Users may change the sizes of their blocks +by supplying the values through `.toBlockMatrix(rowsPerBlock, colsPerBlock)`. + +{% highlight scala %} +import org.apache.spark.mllib.linalg.SingularValueDecomposition +import org.apache.spark.mllib.linalg.distributed.{BlockMatrix, CoordinateMatrix, MatrixEntry} + +val entries: RDD[MatrixEntry] = ... // an RDD of (i, j, v) matrix entries +// Create a CoordinateMatrix from an RDD[MatrixEntry]. +val coordMat: CoordinateMatrix = new CoordinateMatrix(entries) +// Transform the CoordinateMatrix to a BlockMatrix +val matA: BlockMatrix = coordMat.toBlockMatrix().cache() + +// validate whether the BlockMatrix is set up properly. Throws an Exception when it is not valid. +// Nothing happens if it is valid. +matA.validate + +// Calculate A^T A. +val AtransposeA = matA.transpose.multiply(matA) + +// get SVD of 2 * A +val A2 = matA.add(matA) +val svd = A2.toIndexedRowMatrix().computeSVD(20, false, 1e-9) +{% endhighlight %} +</div> + +<div data-lang="java" markdown="1"> + +A [`BlockMatrix`](api/scala/index.html#org.apache.spark.mllib.linalg.distributed.BlockMatrix) can be +most easily created from an `IndexedRowMatrix` or `CoordinateMatrix` using `.toBlockMatrix()`. +`.toBlockMatrix()` will create blocks of size 1024 x 1024. Users may change the sizes of their blocks +by supplying the values through `.toBlockMatrix(rowsPerBlock, colsPerBlock)`. + +{% highlight java %} +import org.apache.spark.api.java.JavaRDD; +import org.apache.spark.mllib.linalg.SingularValueDecomposition; +import org.apache.spark.mllib.linalg.distributed.BlockMatrix; +import org.apache.spark.mllib.linalg.distributed.CoordinateMatrix; +import org.apache.spark.mllib.linalg.distributed.IndexedRowMatrix; + +JavaRDD<MatrixEntry> entries = ... // a JavaRDD of (i, j, v) Matrix Entries +// Create a CoordinateMatrix from a JavaRDD<MatrixEntry>. +CoordinateMatrix coordMat = new CoordinateMatrix(entries.rdd()); +// Transform the CoordinateMatrix to a BlockMatrix +BlockMatrix matA = coordMat.toBlockMatrix().cache(); + +// validate whether the BlockMatrix is set up properly. Throws an Exception when it is not valid. +// Nothing happens if it is valid. +matA.validate(); + +// Calculate A^T A. +BlockMatrix AtransposeA = matA.transpose().multiply(matA); + +// get SVD of 2 * A +BlockMatrix A2 = matA.add(matA); +SingularValueDecomposition<IndexedRowMatrix, Matrix> svd = + A2.toIndexedRowMatrix().computeSVD(20, false, 1e-9); +{% endhighlight %} +</div> +</div> + ### RowMatrix A `RowMatrix` is a row-oriented distributed matrix without meaningful row indices, backed by an RDD --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org