I was refering to the class of "classical" graph algorithms (like shortest path, min cut, betweeness, triangle enumeration, etc) that Jake was also talking about.
--sebastian On 02.11.2011 13:13, Dan Brickley wrote: > On 2 November 2011 10:24, Sebastian Schelter <[email protected]> wrote: >> As you might know I recently started an experimental graph mining >> module. I was already concerned at the beginning of this whether >> MapReduce is really a suitable platform for (most) graph algorithms. >> >> I'm not content with the performance of the algorithms after some >> testing and I'm pretty sure the future of large scale graph processing >> is not on MapReduce (but hopefully on a Pregel like platform such as >> Giraph). >> >> As we're currently removing clutter and trying to concentrate on the >> core algorithms, I suggest to remove all graph algorithms with the >> exception of PageRank. >> >> If no one objects with this, I'll start the cleanup in a few days. > > It all depends what you mean by 'graph algorithms', as Jake more or > less says. I take your point re shortest paths etc. However it would > be a mistake I think to send out a message that Mahout isn't good for > consuming graph data, even while Hadoop certainly has issues with some > kinds of graph-processing. > > All this can be something of a matter of perspective and descriptive > gloss. Much of the work of the recommender / Taste component of Mahout > can be thought of (and marketed as?) consuming a specialist flavour of > graph data. Something like an 'interest graph' (a > http://en.wikipedia.org/wiki/Bipartite_graph) where the nodes are > items or users, and the affinities/associations are indications of > interest (possibly date-stamped, possibly weighted). > > I work a lot with factual graph data expressed in W3C RDF form; in > this case our 'graph' has nodes that are entities or atomic values, > and links that are different typed links, representing relationship > types, attributes/properties etc. Depending on the task in hand this > can be consumed in Mahout by munging it into recommendations format > input, or as with CSV input, into vectors, etc. So again it's 'graph' > data processing even if the processing paradigm isn't from graph > theory. > > Finally the spectral clustering piece of Mahout also takes graph input > (affinities) and there are decades of research papers that account for > this in terms of eigenvectors/values of laplacian representations of > the graph affinity matrix; so I'd also count that as a Mahout tool for > (I guess 'lossy' in Jake's terminology) graph processing. > > Or am I being too marketing-minded here? Is it fair to say "Mahout is > a toolkit that can do specific useful things with various forms of > graph-shaped data, but isn't a general-purpose graph processing > environment"? > > Dan
