Hi Young, The problem is one of documentation, and poor naming of the method:
DistributedRowMatrix.times(DistributedRowMatrix m) should be called DistributedRowMatrix.transposeTimes(DistributedRowMatrix m), as it computes a.transpose().times(b), not a.times(b). See the javadocs for the interface method: http://lucene.apache.org/mahout/javadoc/mahout-math/org/apache/mahout/math/VectorIterable.html The reason is that the most efficient distributed matrix multiplication for sparse matrices is done using a map-side join, which if the first matrix "A" is represented as a distributed *ROW* matrix, then what it is computing is A.transpose().times(B), in one map-reduce pass, by pretending the rows are actually columns (ie taking an O(1) virtual transpose operation). To check to see you are able to get what you want, try leaving off your transpose step in your example step when using the distributed matrices, and see if you get the same answer (you should, because this is exactly what we do in the unit tests for this class). -jake On Mon, Apr 12, 2010 at 1:43 AM, Young Y. Kim <yoon...@gmail.com> wrote: > I'm trying to test DistributedRowMatrix in eclipse for matrix calcuration > in > hadoop. > A = > [[85,68,30,15,50,34], > [53,38,19,70,90,29], > [20,83,19,38,82,34], > [67,50,68,86,64,53], > [84,71,30,85,82,73], > [2,43,54,50,66,31]] > > DistributedRowMatrix m = DistributedRowMatrix(path,...) > ; > and check the values of m with iterating, it's fine. > m.transpose() result was same good. > but if m.transpose().mult(m) , multiplication result doesn't right. > > it must be > >>> A.transpose()*A > matrix([[21983, 18854, 11121, 18747, 21968, 14852], > [18854, 22347, 12191, 19319, 25486, 15402], > [11121, 12191, 10062, 13600, 15144, 9685], > [18747, 19319, 13600, 23690, 25940, 16145], > [21968, 25486, 15144, 25940, 32500, 18522], > [14852, 15402, 9685, 16145, 18522, 12252]]) > (with python) > but Mahout result is > > mTm = > 0:16702.0 1:19207.0 2:15981.0 3:20949.0 4:24232.0 > 5:12485.0 > 0:16616.0 1:17762.0 2:15275.0 3:23223.0 4:24111.0 > 5:14771.0 > 0:8768.0 1:11699.0 2:9418.0 3:14882.0 4:16621.0 5:8957.0 > 0:14415.0 1:19297.0 2:18770.0 3:22575.0 4:25300.0 > 5:16552.0 > 0:20134.0 1:21402.0 2:21428.0 3:27676.0 4:30032.0 > 5:19056.0 > 0:11381.0 1:14729.0 2:12787.0 3:16913.0 4:18689.0 5:11580.0 > > What's the problem? > > Thanks. > > ps. > source code is very simple. > .... > DistributedRowMatrix m = new > DistributedRowMatrix("/tmp/testdata/6x6.mat", "/tmp/testdata/tmpOut", 6, > 6); > m.configure(new JobConf()); > > System.out.println("original matrix = "); > printMatrix(m); // matrix printing > > DistributedRowMatrix mT = m.transpose(); > System.out.println("mT = "); > printMatrix(mT); > > DistributedRowMatrix mTm = mT.times(m); > System.out.println("mTm = "); > printMatrix(mTm); > ... > or printMatrix(m.transpose().mult(m)); >