[ https://issues.apache.org/jira/browse/SPARK-3434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14171272#comment-14171272 ]
Shivaram Venkataraman commented on SPARK-3434: ---------------------------------------------- Sorry for the delay in getting back -- I've posted a design doc at http://goo.gl/0eE5fh and a reference implementation at https://github.com/amplab/ml-matrix. The design doc and the reference implementation use Spark as a library -- so this works as a standalone library in case somebody wants to try it out. Some more points to note regarding the integration: 1. The existing implementation uses breeze matrices in the interface but we will change that to use local Matrix trait already present in Spark. 2. The matrix layouts will also extend the DistributedMatrix class in MLLib and we could create a new interface BlockDistributedMatrix from the interface in amplab/ml-matrix 3. We can also use this JIRA or create a new JIRA to discuss what algorithms / operations should be merged into Spark. I think TSQR, NormalEquations should be pretty useful. Other algorithms like 2-D BlockQR and BlockCoordinateDescent can be merged later if we feel its useful (these haven't been pushed to ml-matrix yet). I will create a first patch for the matrix formats in a couple of days. Please let me know if there are any questions / clarifications. > Distributed block matrix > ------------------------ > > Key: SPARK-3434 > URL: https://issues.apache.org/jira/browse/SPARK-3434 > Project: Spark > Issue Type: New Feature > Components: MLlib > Reporter: Xiangrui Meng > > This JIRA is for discussing distributed matrices stored in block > sub-matrices. The main challenge is the partitioning scheme to allow adding > linear algebra operations in the future, e.g.: > 1. matrix multiplication > 2. matrix factorization (QR, LU, ...) > Let's discuss the partitioning and storage and how they fit into the above > use cases. > Questions: > 1. Should it be backed by a single RDD that contains all of the sub-matrices > or many RDDs with each contains only one sub-matrix? -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org