SparseRowMatrices from dense matrix operations

Andrew Palumbo Thu, 08 Sep 2016 08:17:08 -0700

@ssc

Re: SparseRowMatrices from dense operations, there are some operations that use 
`SparseRowMatrix` as the default for the accumulator in their combiners.  E.g.,


Spark ABt: 
https://github.com/apache/mahout/blob/master/spark/src/main/scala/org/apache/mahout/sparkbindings/blas/ABt.scala#L296


I believe that it was implemented this way so that in the worst case of over 
sized in-core Sparse %*% Dense matrix multiplication if the result was too 
large it would not throw an OOM error.    This is what we created the 
densityAnalaysis(..) method for, to detect the actual density of a matrix on 
the fly and to use the appropriate structure based on the data itself.


It is actually not being used in Spark ABt yet.  There is actually a Jira open 
to go through and use densityAnalysis() in all appropriate cases: 
https://issues.apache.org/jira/browse/MAHOUT-1873?filter=-1


So currently, ABt (and possibly some other operations) will return a 
`SparseRowMatrix` as a result of 2 dense matrices (if I'm reading it correctly).


It looks like this is a good candidate for densityAnalysis().

SparseRowMatrices from dense matrix operations

Reply via email to