[
https://issues.apache.org/jira/browse/SPARK-3434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14171272#comment-14171272
]
Shivaram Venkataraman commented on SPARK-3434:
----------------------------------------------
Sorry for the delay in getting back -- I've posted a design doc at
http://goo.gl/0eE5fh and a reference implementation at
https://github.com/amplab/ml-matrix. The design doc and the reference
implementation use Spark as a library -- so this works as a standalone library
in case somebody wants to try it out.
Some more points to note regarding the integration:
1. The existing implementation uses breeze matrices in the interface but we
will change that to use local Matrix trait already present in Spark.
2. The matrix layouts will also extend the DistributedMatrix class in MLLib and
we could create a new interface BlockDistributedMatrix from the interface in
amplab/ml-matrix
3. We can also use this JIRA or create a new JIRA to discuss what algorithms /
operations should be merged into Spark. I think TSQR, NormalEquations should be
pretty useful. Other algorithms like 2-D BlockQR and BlockCoordinateDescent can
be merged later if we feel its useful (these haven't been pushed to ml-matrix
yet).
I will create a first patch for the matrix formats in a couple of days. Please
let me know if there are any questions / clarifications.
> Distributed block matrix
> ------------------------
>
> Key: SPARK-3434
> URL: https://issues.apache.org/jira/browse/SPARK-3434
> Project: Spark
> Issue Type: New Feature
> Components: MLlib
> Reporter: Xiangrui Meng
>
> This JIRA is for discussing distributed matrices stored in block
> sub-matrices. The main challenge is the partitioning scheme to allow adding
> linear algebra operations in the future, e.g.:
> 1. matrix multiplication
> 2. matrix factorization (QR, LU, ...)
> Let's discuss the partitioning and storage and how they fit into the above
> use cases.
> Questions:
> 1. Should it be backed by a single RDD that contains all of the sub-matrices
> or many RDDs with each contains only one sub-matrix?
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]