[ 
https://issues.apache.org/jira/browse/MAHOUT-1837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15269875#comment-15269875
 ] 

ASF GitHub Bot commented on MAHOUT-1837:
----------------------------------------

Github user andrewpalumbo commented on a diff in the pull request:

    https://github.com/apache/mahout/pull/228#discussion_r61976030
  
    --- Diff: 
math-scala/src/main/scala/org/apache/mahout/math/scalabindings/package.scala ---
    @@ -410,4 +412,34 @@ package object scalabindings {
     
       def dist(mxX: Matrix, mxY: Matrix): Matrix = sqDist(mxX, mxY) := sqrt _
     
    +  /**
    +    * Check the density of an in-core matrix based on supplied criteria.
    +    *
    +    * @param mxX  The matrix to check density of.
    +    * @param rowSparsityThreshold the proportion of the rows which must be 
dense.
    +    * @param elementSparsityThreshold the prpoportion of the rows in the 
random sample of the  matrix which must be dense.
    +    * @param sample how moch of the matrix to sample.
    +    */
    +  def isMatrixDense(mxX: Matrix, rowSparsityThreshold: Double = .30, 
elementSparsityThreshold: Double = .30, sample: Double = .25): Boolean = {
    +    val rand = RandomUtils.getRandom
    +    val m = mxX.numRows()
    +    val numRowToTest: Int = (sample * m).toInt
    +
    +    var numDenseRows: Int = 0
    +
    +    for (i <- 0 until numRowToTest) {
    +      // select a row at random
    +      val row: Vector = mxX(rand.nextInt(m), ::)
    +      // check the sparsity of that rosw if it is greater than the set 
sparsity threshold count this row as dense
    +      if (row.getNumNonZeroElements / row.size().toDouble > 
elementSparsityThreshold) {
    +        numDenseRows = numDenseRows + 1
    +      }
    +    }
    +
    +    // return the number of denserows/tested rows > rowSparsityThreshold
    +    numDenseRows/numRowToTest > rowSparsityThreshold
    +  }
    +
    --- End diff --
    
    @dlyubimov does this seem like a decent test for matrix Density?  I've put 
in both an `elementSparsityThreshold` to determine if a Vector itself is 
sparse, and a `rowSparsityThreshold` as a threshold for the entire matrix.  
I've also added in a `Vector.mean()` method but am not sure if it is needed in 
this case. 


> Sparse/Dense Matrix analysis for Matrix Multiplication
> ------------------------------------------------------
>
>                 Key: MAHOUT-1837
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-1837
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Math
>    Affects Versions: 0.12.0
>            Reporter: Andrew Palumbo
>            Assignee: Andrew Palumbo
>             Fix For: 0.12.1
>
>
> In matrix multiplication, Sparse Matrices can easily turn dense and bloat 
> memory,  one fully dense column and one fully dense row can cause a sparse 
> %*% sparse operation have a dense result.  
> There are two issues here one with a quick Fix and one a bit more involved:
>    #  in {{ABt.Scala}} use check the `MatrixFlavor` of the combiner and use 
> the flavor of the Block as the resulting Sparse or Dense matrix type:
> {code}
> val comb = if (block.getFlavor == MatrixFlavor.SPARSELIKE) {
>               new SparseMatrix(prodNCol, block.nrow).t
>             } else {
>               new DenseMatrix(prodNCol, block.nrow).t
>             }
> {code}
>  a simlar check needs to be made in the {{blockify}} transformation.
>  
>    #  More importantly, and more involved is to do an actual analysis of the 
> resulting matrix data in the in-core {{mmul}} class and use a matrix of the 
> appropriate Structure as a result. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to