Re: Ideal number of mappers and reducers; any physical limits?

Ted Dunning Tue, 28 Oct 2008 22:13:32 -0700

On Tue, Oct 28, 2008 at 5:15 PM, Edward J. Yoon <[EMAIL PROTECTED]>wrote:


> ...
> In single machine, as far as we
> know graph can be stored to linked list or matrix.
>

Since the matrix is normally very sparse for large graphs, these two
approaches are pretty similar.


> ... So, I guess google's web graph will be stored as a matrix in a
> bigTable.
>

I doubt that it is stored as an explicit matrix.  Each page would probably
have a big table (or file) entry and would have a list of links including
link text.


Have you seen my 2D block algorithm post?? --
> http://blog.udanax.org/2008/10/parallel-matrix-multiply-on-hadoop.html
>

I have now.  Block decomposition for multiplies almost always applies only
to dense matrix operations.  For most sparse matrix representations
extracting a block is only efficient if it is full width or height.  For
very sparse matrix operations, the savings due to reuse of intermediate
results are completely dominated by the I/O cost so block decompositions are
much less helpful.

In many cases, it isn't even very helpful to send around entire rows and
sending individual elements is about as efficient.

FYI, Hama (http://incubator.apache.org/hama/) will be handled graph
> algorithms since it is a related with adjacency matrix and topological
> algebra. And I think 2000 node hadoop/hbase cluster is big enough if a
> sequential/random read/write speed will be improved 800%. :-)
>

I think that a 5 node cluster is big enough without any improvement in
read/write speed.

Of course, it depends on the size of the problem.  I was only working with a
matrix with a few tens of billions of non-zero values.

Re: Ideal number of mappers and reducers; any physical limits?

Reply via email to