[ https://issues.apache.org/jira/browse/SPARK-5766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14318013#comment-14318013 ]
Amaru Cuba Gyllensten commented on SPARK-5766: ---------------------------------------------- Yeah, I noticed it when multiplying a 10,000 by 2000 IndexedRowMatrix with its transpose (represented as a local matrix), and doing some reductions on the rows. Running on my local machine, the multiplication in spark took about 7 times longer than an implementation where the left hand matrix was chunked and each chunk (consisiting of ~1000 rows) was multiplied with gemm (or similar). This might be an unfair comparison, as it kinda requires the rows to be stored locally as dense matrices. (A use case which might be covered by the upcoming BlockMatrix?) > Slow RowMatrix multiplication > ----------------------------- > > Key: SPARK-5766 > URL: https://issues.apache.org/jira/browse/SPARK-5766 > Project: Spark > Issue Type: Improvement > Components: MLlib > Reporter: Amaru Cuba Gyllensten > Priority: Minor > Labels: matrix > > Looking at the source code for RowMatrix multiplication by a local matrix, it > seems like it is going through all columnvectors of the matrix, doing > pairwise dot product on each column. > It seems like this could be sped up by using gemm, performing full > matrix-matrix multiplication on the local data, (or gemv, for vector-matrix > multiplication), as is done in BlockMatrix or Matrix. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org