After commit HAMA-142, I finally fulfilled the multiplication of 10,000 * 10,000 dense matrices. I am gratified with this result. But, there is a lot of netsent/netreceived bytes between master and slaves and overhead of read operation in a loop during multiplication.
BTW, blocked dense matrix have small rows. Hence, It doesn't horizontally spread to each machine. 09/01/07 17:36:14 INFO mapred.TableInputFormatBase: split: 0->d8g053.nhncorp.com:,000000000000,0,10 09/01/07 17:36:14 INFO mapred.TableInputFormatBase: split: 1->d8g053.nhncorp.com:000000000000,0,10, /Edward On Tue, Jan 6, 2009 at 2:11 PM, Edward J. Yoon <[email protected]> wrote: > Oh, sorry. It's 8 GB. > > On Tue, Jan 6, 2009 at 2:05 PM, Edward J. Yoon <[email protected]> wrote: >> Let's assume matrix a * b of 10,000 * 10,000 dense matrices, >> >> 5 * 5 blocks, >> 1 block is 2000 * 2000 and 16 MB, >> >> 0 : c(0, 0) += a(0, 0) * b(0, 0) >> 1 : c(0, 1) += a(0, 0) * b(0, 1) >> ... >> 123 : c(4, 3) += a(4, 4) * b(4, 3) >> 124 : c(4, 4) += a(4, 4) * b(4, 4) >> >> 5^3 * 32 MB = 4 GB. >> >> collection table size is 4 GB. Anyway, let's try it. >> >> On Tue, Jan 6, 2009 at 12:37 PM, Samuel Guo <[email protected]> wrote: >>> +1 >>> hmm, it is tricky. >>> >>> On Tue, Jan 6, 2009 at 11:04 AM, Edward J. Yoon >>> <[email protected]>wrote: >>> >>>> If we collect blocks to one table during blocking_mapred(), locality >>>> will be provided and more faster. >>>> >>>> row Key column:A column:B >>>> c(0, 0) += a(0, 0) * b(0, 0) >>>> c(0, 0) += a(0, 1) * b(1, 0) >>>> c(0, 0) += a(0, 2) * b(2, 0) >>>> c(0, 0) += a(0, 3) * b(3, 0) >>>> c(0, 1) += a(0, 0) * b(0, 1) >>>> c(0, 1) += a(0, 1) * b(1, 1) >>>> ... >>>> >>>> What do you think? >>>> >>>> On Mon, Jan 5, 2009 at 10:30 AM, Edward J. Yoon <[email protected]> >>>> wrote: >>>> > Hama Trunk doesn't work for large matrices multiplication with >>>> > mapred.task.timeout and scanner.timeout exception. I tried 1,000,000 * >>>> > 1,000,000 matrix multiplication on 100 node. (Rests are good) >>>> > >>>> > To reduce read operation of duplicated block, I thought as describe >>>> > below. But, each map processing seems too large. >>>> > >>>> > ---- >>>> > // c[i][k] += a[i][j] * b[j][k]; >>>> > >>>> > map() { >>>> > SubMatrix a = value.get(); >>>> > >>>> > for (RowResult row : scan) { >>>> > collect : c[i][k] = a * b[j][k]; >>>> > } >>>> > } >>>> > >>>> > reduce() { >>>> > c[i][k] += c[i][k]; >>>> > } >>>> > ---- >>>> > >>>> > Should we increase {mapred.task.timeout and scanner.timeout}? >>>> > or any good idea? >>>> > >>>> > -- >>>> > Best Regards, Edward J. Yoon @ NHN, corp. >>>> > [email protected] >>>> > http://blog.udanax.org >>>> > >>>> >>>> >>>> >>>> -- >>>> Best Regards, Edward J. Yoon @ NHN, corp. >>>> [email protected] >>>> http://blog.udanax.org >>>> >>> >> >> >> >> -- >> Best Regards, Edward J. Yoon @ NHN, corp. >> [email protected] >> http://blog.udanax.org >> > > > > -- > Best Regards, Edward J. Yoon @ NHN, corp. > [email protected] > http://blog.udanax.org > -- Best Regards, Edward J. Yoon @ NHN, corp. [email protected] http://blog.udanax.org
