Hi Liquan, There is some working being done on implementing linear algebra algorithms on Spark for use in higher-level machine learning algorithms. That work is happening in the MLlib project, which has a org.apache.spark.mllib.linalgpackage you may find useful.
See https://github.com/apache/spark/tree/master/mllib/src/main/scala/org/apache/spark/mllib/linalg >From my quick look (never read this code before and not familiar with MLlib) both the IndexedRowMatrix and RowMatrix implement a multiply operation: aash@aash-mbp~/git/spark/mllib/src/main/scala/org/apache/spark/mllib/linalg$ git grep 'def multiply' distributed/IndexedRowMatrix.scala: def multiply(B: Matrix): IndexedRowMatrix = { distributed/RowMatrix.scala: def multiply(B: Matrix): RowMatrix = { aash@aash-mbp~/git/spark/mllib/src/main/scala/org/apache/spark/mllib/linalg$ Can you look into using that code and let us know if it meets your needs? Thanks! Andrew On Sat, May 17, 2014 at 10:28 PM, Liquan Pei <liquan...@gmail.com> wrote: > Hi > > I am currently implementing an algorithm involving matrix multiplication. > Basically, I have matrices represented as RDD[Array[Double]]. For example, > If I have A:RDD[Array[Double]] and B:RDD[Array[Double]] and what would be > the most efficient way to get C = A * B > > Both A and B are large, so it would not be possible to save either of them > in memory. > > Thanks a lot for your help! > > Liquan >