[jira] [Commented] (MAHOUT-1837) Sparse/Dense Matrix analysis for Matrix Multiplication

ASF GitHub Bot (JIRA) Fri, 26 Aug 2016 21:24:07 -0700

    [ 
https://issues.apache.org/jira/browse/MAHOUT-1837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15440647#comment-15440647
 ]


ASF GitHub Bot commented on MAHOUT-1837:
----------------------------------------

Github user dlyubimov commented on a diff in the pull request:

    https://github.com/apache/mahout/pull/252#discussion_r76509896
  
    --- Diff: 
spark/src/main/scala/org/apache/mahout/sparkbindings/drm/package.scala ---
    @@ -60,26 +60,22 @@ package object drm {
             val keys = data.map(t => t._1).toArray[K]
             val vectors = data.map(t => t._2).toArray
     
    -        // create the block by default as dense.
    -        // would probably be better to sample a subset of these
    -        // vectors first before creating the entire matrix.
    -        // so that we don't have the overhead of creating a full second 
matrix in
    -        // the case that the matrix is not dense.
    -        val block = new DenseMatrix(vectors.length, blockncol)
    -        var row = 0
    -        while (row < vectors.length) {
    -          block(row, ::) := vectors(row)
    -          row += 1
    -        }
    +        // create the block by default as Sparse.
    +        val block = new SparseRowMatrix(vectors.length, blockncol, 
vectors, true, false)
     
    -        // Test the density of the data. If the matrix does not meet the
    -        // requirements for density, convert the Vectors to a sparse 
Matrix.
    +        // Test the density of the data. If the matrix does meets the
    +        // requirements for density, convert the Vectors to a DenseMatrix.
             val resBlock = if (densityAnalysis(block)) {
    -          block
    +          val dBlock = new DenseMatrix(vectors.length, blockncol)
    +          var row = 0
    +          while (row < vectors.length) {
    +            dBlock(row, ::) := vectors(row)
    +            row += 1
    +          }
    +          dBlock
             } else {
    -          new SparseRowMatrix(vectors.length, blockncol, vectors, true, 
false)
    --- End diff --
    
    No, i believe sequential should stay as it will be more natural (ordered)
    when converted to CSR which hopefuly will be our most common modus operandi
    for exponential algorithms.
    
    in fact, random access rarely makes sense at all for block-wise algorithms.
    
    On Fri, Aug 26, 2016 at 8:03 PM, Andrew Palumbo <[email protected]>
    wrote:
    
    > In spark/src/main/scala/org/apache/mahout/sparkbindings/drm/package.scala
    > <https://github.com/apache/mahout/pull/252#discussion_r76509011>:
    >
    > >          } else {
    > > -          new SparseRowMatrix(vectors.length, blockncol, vectors, 
true, false)
    >
    > Should the default here be a SparseRowMatrix of Random Access Sparse
    > Vectors? Seems so. I.e.
    > This line should probably read:
    >
    > new SparseRowMatrix(vectors.length, blockncol, vectors, true, true)
    >
    > rather than:
    >
    > new SparseRowMatrix(vectors.length, blockncol, vectors, true, false)
    >
    > as is, correct?
    >
    > —
    > You are receiving this because you commented.
    > Reply to this email directly, view it on GitHub
    > 
<https://github.com/apache/mahout/pull/252/files/5d2f5cc9746f968d0c776f869070dd9a439de9f1#r76509011>,
    > or mute the thread
    > 
<https://github.com/notifications/unsubscribe-auth/AAf7_zKcdpkHa0zDQnNvOFMuXQHNW897ks5qj6j4gaJpZM4JtpfO>
    > .
    >



> Sparse/Dense Matrix analysis for Matrix Multiplication
> ------------------------------------------------------
>
>                 Key: MAHOUT-1837
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-1837
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Math
>    Affects Versions: 0.12.0
>            Reporter: Andrew Palumbo
>            Assignee: Andrew Palumbo
>             Fix For: 0.13.0
>
>         Attachments: compareDensityTest.ods
>
>
> In matrix multiplication, Sparse Matrices can easily turn dense and bloat 
> memory,  one fully dense column and one fully dense row can cause a sparse 
> %*% sparse operation have a dense result.  
> There are two issues here one with a quick Fix and one a bit more involved:
>    #  in {{ABt.Scala}} use check the `MatrixFlavor` of the combiner and use 
> the flavor of the Block as the resulting Sparse or Dense matrix type:
> {code}
> val comb = if (block.getFlavor == MatrixFlavor.SPARSELIKE) {
>               new SparseMatrix(prodNCol, block.nrow).t
>             } else {
>               new DenseMatrix(prodNCol, block.nrow).t
>             }
> {code}
>  a simlar check needs to be made in the {{blockify}} transformation.
>  
>    #  More importantly, and more involved is to do an actual analysis of the 
> resulting matrix data in the in-core {{mmul}} class and use a matrix of the 
> appropriate Structure as a result. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MAHOUT-1837) Sparse/Dense Matrix analysis for Matrix Multiplication

Reply via email to