[
https://issues.apache.org/jira/browse/MAHOUT-1837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15440540#comment-15440540
]
ASF GitHub Bot commented on MAHOUT-1837:
----------------------------------------
Github user andrewpalumbo commented on a diff in the pull request:
https://github.com/apache/mahout/pull/252#discussion_r76509011
--- Diff:
spark/src/main/scala/org/apache/mahout/sparkbindings/drm/package.scala ---
@@ -60,26 +60,22 @@ package object drm {
val keys = data.map(t => t._1).toArray[K]
val vectors = data.map(t => t._2).toArray
- // create the block by default as dense.
- // would probably be better to sample a subset of these
- // vectors first before creating the entire matrix.
- // so that we don't have the overhead of creating a full second
matrix in
- // the case that the matrix is not dense.
- val block = new DenseMatrix(vectors.length, blockncol)
- var row = 0
- while (row < vectors.length) {
- block(row, ::) := vectors(row)
- row += 1
- }
+ // create the block by default as Sparse.
+ val block = new SparseRowMatrix(vectors.length, blockncol,
vectors, true, false)
- // Test the density of the data. If the matrix does not meet the
- // requirements for density, convert the Vectors to a sparse
Matrix.
+ // Test the density of the data. If the matrix does meets the
+ // requirements for density, convert the Vectors to a DenseMatrix.
val resBlock = if (densityAnalysis(block)) {
- block
+ val dBlock = new DenseMatrix(vectors.length, blockncol)
+ var row = 0
+ while (row < vectors.length) {
+ dBlock(row, ::) := vectors(row)
+ row += 1
+ }
+ dBlock
} else {
- new SparseRowMatrix(vectors.length, blockncol, vectors, true,
false)
--- End diff --
Should the default here be a `SparseRowMatrix` of Random Access Sparse
Vectors? Seems so. I.e.
This line should probably read:
```java
new SparseRowMatrix(vectors.length, blockncol, vectors, true, true)
```
rather than:
```java
new SparseRowMatrix(vectors.length, blockncol, vectors, true, false)
```
as is, correct?
> Sparse/Dense Matrix analysis for Matrix Multiplication
> ------------------------------------------------------
>
> Key: MAHOUT-1837
> URL: https://issues.apache.org/jira/browse/MAHOUT-1837
> Project: Mahout
> Issue Type: Improvement
> Components: Math
> Affects Versions: 0.12.0
> Reporter: Andrew Palumbo
> Assignee: Andrew Palumbo
> Fix For: 0.13.0
>
> Attachments: compareDensityTest.ods
>
>
> In matrix multiplication, Sparse Matrices can easily turn dense and bloat
> memory, one fully dense column and one fully dense row can cause a sparse
> %*% sparse operation have a dense result.
> There are two issues here one with a quick Fix and one a bit more involved:
> # in {{ABt.Scala}} use check the `MatrixFlavor` of the combiner and use
> the flavor of the Block as the resulting Sparse or Dense matrix type:
> {code}
> val comb = if (block.getFlavor == MatrixFlavor.SPARSELIKE) {
> new SparseMatrix(prodNCol, block.nrow).t
> } else {
> new DenseMatrix(prodNCol, block.nrow).t
> }
> {code}
> a simlar check needs to be made in the {{blockify}} transformation.
>
> # More importantly, and more involved is to do an actual analysis of the
> resulting matrix data in the in-core {{mmul}} class and use a matrix of the
> appropriate Structure as a result.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)