If you're just doing matrix multiplication, I would advise that mahout (or any mapreduce approach) isn't well suited to your problem. I did the same computation with matlab (multiplying two 40k x 40k random double precision dense matrices) using 14 cores and about 36GB of ram on a single machine* and it finished in about 55 minutes. If I'm reading your email correctly, you were working with 34*2*4=272 cores! I'm not sure if dense matrix multiplication can actually be efficiently mapreduced, but I am still a rookie so don't take my word for it.
*The machine I am working on has 8 dual core AMD opteron 875s @ 2.2GHz per core, with 64GB total system memory. Steven Buss steven.b...@gmail.com http://www.stevenbuss.com/ On Sun, Apr 11, 2010 at 11:53 PM, Ted Dunning <ted.dunn...@gmail.com> wrote: > Vimal, > > We don't have any distributed dense multiplication operations because we > have not yet found much application demand for distributed dense matrix > multiplication. Distributed sparse matrix operations are a big deal, > however. > > If you are interested in working on the problem in the context of Mahout, we > would love to help. This is especially true if you have an application that > needs dense operations and could benefit from some of the other capabilities > in Mahout. > > On Sun, Apr 11, 2010 at 1:27 PM, Vimal Mathew <vml.mat...@gmail.com> wrote: > >> Hi, >> What's the current state of matrix-matrix multiplication in Mahout? >> Are there any performance results available for large matrices? >> >> I have been working on a Hadoop-compatible distributed storage for >> matrices. I can currently multiply two 40K x 40K dense double >> precision matrices in around 1 hour using 34 systems (16GB RAM, two >> Core2Quads' per node). I was wondering how this compares with Mahout. >> >> Regards, >> Vimal >> >