On Tue, Oct 28, 2008 at 5:15 PM, Edward J. Yoon <[EMAIL PROTECTED]>wrote:
> ... > In single machine, as far as we > know graph can be stored to linked list or matrix. > Since the matrix is normally very sparse for large graphs, these two approaches are pretty similar. > ... So, I guess google's web graph will be stored as a matrix in a > bigTable. > I doubt that it is stored as an explicit matrix. Each page would probably have a big table (or file) entry and would have a list of links including link text. Have you seen my 2D block algorithm post?? -- > http://blog.udanax.org/2008/10/parallel-matrix-multiply-on-hadoop.html > I have now. Block decomposition for multiplies almost always applies only to dense matrix operations. For most sparse matrix representations extracting a block is only efficient if it is full width or height. For very sparse matrix operations, the savings due to reuse of intermediate results are completely dominated by the I/O cost so block decompositions are much less helpful. In many cases, it isn't even very helpful to send around entire rows and sending individual elements is about as efficient. FYI, Hama (http://incubator.apache.org/hama/) will be handled graph > algorithms since it is a related with adjacency matrix and topological > algebra. And I think 2000 node hadoop/hbase cluster is big enough if a > sequential/random read/write speed will be improved 800%. :-) > I think that a 5 node cluster is big enough without any improvement in read/write speed. Of course, it depends on the size of the problem. I was only working with a matrix with a few tens of billions of non-zero values.
