Dunning, Currently, most of the matrix data (graph matrix, document-word matrix) that we are dealing with are sparse. The matrix decomposition often needs many iterations to converge, then intermediate results have to be saved to serve as the input for the next iteration. This is super inefficient. As I mentioned, the BFS algorithm written in python took 1 second to run, however, hadoop took almost 3 hours. I would expect hadoop to be slower, but not this slow. All the hadoop applications that I saw are all very simple calculations, I wonder how it can be applied to machine learning/data mining algorithms. Is HAMA the only way to solve it? If it is not ready to use yet, then can I assume hadoop is not a good solution for multiple iteration algorithms now?
Wei -----Original Message----- From: Ted Dunning [mailto:[email protected]] Sent: Monday, December 20, 2010 9:03 PM To: [email protected] Subject: Re: breadth-first search On Mon, Dec 20, 2010 at 8:16 PM, Peng, Wei <[email protected]> wrote: > ... My question is really about what is the efficient way for graph > computation, matrix computation, algorithms that need many iterations to > converge (with intermediate results). > Large graph computations usually assume a sparse graph for historical reasons. A key property of scalable algorithms is that the time and space are linear in the input size. Most all path algorithms are not linear because the result is n x n and is dense. Some graph path computations can be done indirectly by spectral methods. With good random projection algorithms for sparse matrix decomposition, approximate versions of some of these algorithms can be phrased in a scalable fashion. It isn't an easy task, however. > HAMA looks like a very good solution, but can we use it now and how to > use it? > > I don't think that Hama has produced any usable software yet.
