RE: breadth-first search

Peng, Wei Mon, 20 Dec 2010 21:44:29 -0800

Dunning,

Currently, most of the matrix data (graph matrix, document-word matrix)
that we are dealing with are sparse.
The matrix decomposition often needs many iterations to converge, then
intermediate results have to be saved to serve as the input for the next
iteration. 
This is super inefficient. As I mentioned, the BFS algorithm written in
python took 1 second to run, however, hadoop took almost 3 hours. I
would expect hadoop to be slower, but not this slow.
All the hadoop applications that I saw are all very simple calculations,
I wonder how it can be applied to machine learning/data mining
algorithms.
Is HAMA the only way to solve it? If it is not ready to use yet, then
can I assume hadoop is not a good solution for multiple iteration
algorithms now?

Wei

-----Original Message-----
From: Ted Dunning [mailto:[email protected]] 
Sent: Monday, December 20, 2010 9:03 PM
To: [email protected]
Subject: Re: breadth-first search

On Mon, Dec 20, 2010 at 8:16 PM, Peng, Wei <[email protected]> wrote:

> ... My question is really about what is the efficient way for graph
> computation, matrix computation, algorithms that need many iterations
to
> converge (with intermediate results).
>

Large graph computations usually assume a sparse graph for historical
reasons.  A key property of scalable
algorithms is that the time and space are linear in the input size.
Most
all path algorithms are not linear because
the result is n x n and is dense.

Some graph path computations can be done indirectly by spectral methods.
 With good random projection algorithms for sparse matrix decomposition,
approximate versions of some of these algorithms can be phrased in a
scalable fashion.  It isn't an easy task, however.

> HAMA looks like a very good solution, but can we use it now and how to
> use it?
>
>
I don't think that Hama has produced any usable software yet.

RE: breadth-first search

Reply via email to