[ 
https://issues.apache.org/jira/browse/SPARK-3434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14171272#comment-14171272
 ] 

Shivaram Venkataraman commented on SPARK-3434:
----------------------------------------------

Sorry for the delay in getting back -- I've posted a design doc at 
http://goo.gl/0eE5fh and a reference implementation at 
https://github.com/amplab/ml-matrix. The design doc and the reference 
implementation use Spark as a library -- so this works as a standalone library 
in case somebody wants to try it out.

Some more points to note regarding the integration:
1. The existing implementation uses breeze matrices in the interface but we 
will change that to use local Matrix trait already present in Spark.
2. The matrix layouts will also extend the DistributedMatrix class in MLLib and 
we could create a new interface BlockDistributedMatrix from the interface in 
amplab/ml-matrix
3. We can also use this JIRA or create a new JIRA to discuss what algorithms / 
operations should be merged into Spark. I think TSQR, NormalEquations should be 
pretty useful. Other algorithms like 2-D BlockQR and BlockCoordinateDescent can 
be merged later if we feel its useful (these haven't been pushed to ml-matrix 
yet).

I will create a first patch for the matrix formats in a couple of days. Please 
let me know if there are any questions / clarifications.

> Distributed block matrix
> ------------------------
>
>                 Key: SPARK-3434
>                 URL: https://issues.apache.org/jira/browse/SPARK-3434
>             Project: Spark
>          Issue Type: New Feature
>          Components: MLlib
>            Reporter: Xiangrui Meng
>
> This JIRA is for discussing distributed matrices stored in block 
> sub-matrices. The main challenge is the partitioning scheme to allow adding 
> linear algebra operations in the future, e.g.:
> 1. matrix multiplication
> 2. matrix factorization (QR, LU, ...)
> Let's discuss the partitioning and storage and how they fit into the above 
> use cases.
> Questions:
> 1. Should it be backed by a single RDD that contains all of the sub-matrices 
> or many RDDs with each contains only one sub-matrix?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to