If we collect blocks to one table during blocking_mapred(), locality
will be provided and more faster.

row Key   column:A   column:B
c(0, 0) += a(0, 0) * b(0, 0)
c(0, 0) += a(0, 1) * b(1, 0)
c(0, 0) += a(0, 2) * b(2, 0)
c(0, 0) += a(0, 3) * b(3, 0)
c(0, 1) += a(0, 0) * b(0, 1)
c(0, 1) += a(0, 1) * b(1, 1)
...

What do you think?

On Mon, Jan 5, 2009 at 10:30 AM, Edward J. Yoon <[email protected]> wrote:
> Hama Trunk doesn't work for large matrices multiplication with
> mapred.task.timeout and scanner.timeout exception. I tried 1,000,000 *
> 1,000,000 matrix multiplication on 100 node. (Rests are good)
>
> To reduce read operation of duplicated block, I thought as describe
> below. But, each map processing seems too large.
>
> ----
> // c[i][k] += a[i][j] * b[j][k];
>
> map() {
>  SubMatrix a = value.get();
>
>  for (RowResult row : scan) {
>     collect : c[i][k] = a * b[j][k];
>  }
> }
>
> reduce() {
>  c[i][k] += c[i][k];
> }
> ----
>
> Should we increase {mapred.task.timeout and scanner.timeout}?
> or any good idea?
>
> --
> Best Regards, Edward J. Yoon @ NHN, corp.
> [email protected]
> http://blog.udanax.org
>



-- 
Best Regards, Edward J. Yoon @ NHN, corp.
[email protected]
http://blog.udanax.org

Reply via email to