Alex Favaro created SPARK-31122:
-----------------------------------
Summary: Add support for sparse matrix multiplication
Key: SPARK-31122
URL: https://issues.apache.org/jira/browse/SPARK-31122
Project: Spark
Issue Type: New Feature
Components: MLlib
Affects Versions: 2.4.5
Reporter: Alex Favaro
MLlib does not currently support multiplication of sparse matrices. When
multiplying block matrices with sparse blocks, the sparse blocks are first
converted to dense matrices. This leads to large increases in memory
utilization for certain problems.
I'd like to propose adding support for local sparse matrix multiplication to
MLlib, as well as local dense-sparse matrix multiplication. With these changes,
the case clause which converts sparse blocks to dense matrices in the block
matrix multiply method could be removed.
One question is whether the result of sparse-sparse matrix multiplication
should be sparse or dense, since the product of two sparse matrices can be
quite dense depending on the matrices. I propose returning a sparse matrix,
however, and letting the application convert the result to a dense matrix if
necessary. There is some precedent for this with the block matrix add method,
which returns sparse matrix blocks even when adding a sparse matrix block to a
dense matrix block.
As for the implementation, one option would be to leverage Breeze's existing
sparse matrix multiplication, as MLlib currently does for matrix addition.
Another would be to add support for sparse-sparse multiplication to the BLAS
wrapper, which would be consistent with the sparse-dense multiplication
implementation and could support a more efficient routine for transposed
matrices (as Breeze does not support transposed matrices). The exact algorithm
would follow that laid out in ["Sparse Matrix Multiplication Package
(SMMP)"|https://www.i2m.univ-amu.fr/perso/abdallah.bradji/multp_sparse.pdf].
This would likely not be a huge change but would take some time to test and
benchmark properly, so before I put up a code diff I would be curious to know:
* Is there any interest in supporting this functionality in MLlib?
* Is there a preference for the return type of sparse-sparse multiplication?
(i.e. sparse or dense)
* Is there a preference for the implementation? (Breeze vs a built-in one)
Some tickets which include related functionality or identified this particular
issue but never solved it: SPARK-16820, SPARK-3418.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]