Perhaps it would make sense to move them to a branch? I know we never released them, but it seems a shame for them to be buried in SVN history.
Perhaps we should have an "attic" branch or a "sandbox" branch, where things like this (and Watchmaker) can go to age w/o necessarily being relegated to the big bitbucket in the sky that is previous revisions. I suspect it will be easier for someone to pick up should things improve later than having to dig through SVN history. Besides, one might be careful about drawing conclusions on performance just yet given the state of Hadoop. As the overhead issues get worked through, some of this stuff may not be as bad. I guess the question is, is it the paradigm that is slow or the implementation of the paradigm? (That being said, I do think it's likely the case that this stuff moves to Giraph) On Nov 2, 2011, at 8:49 AM, Sebastian Schelter wrote: > I was refering to the class of "classical" graph algorithms (like > shortest path, min cut, betweeness, triangle enumeration, etc) that Jake > was also talking about. > > --sebastian > > On 02.11.2011 13:13, Dan Brickley wrote: >> On 2 November 2011 10:24, Sebastian Schelter <[email protected]> wrote: >>> As you might know I recently started an experimental graph mining >>> module. I was already concerned at the beginning of this whether >>> MapReduce is really a suitable platform for (most) graph algorithms. >>> >>> I'm not content with the performance of the algorithms after some >>> testing and I'm pretty sure the future of large scale graph processing >>> is not on MapReduce (but hopefully on a Pregel like platform such as >>> Giraph). >>> >>> As we're currently removing clutter and trying to concentrate on the >>> core algorithms, I suggest to remove all graph algorithms with the >>> exception of PageRank. >>> >>> If no one objects with this, I'll start the cleanup in a few days. >> >> It all depends what you mean by 'graph algorithms', as Jake more or >> less says. I take your point re shortest paths etc. However it would >> be a mistake I think to send out a message that Mahout isn't good for >> consuming graph data, even while Hadoop certainly has issues with some >> kinds of graph-processing. >> >> All this can be something of a matter of perspective and descriptive >> gloss. Much of the work of the recommender / Taste component of Mahout >> can be thought of (and marketed as?) consuming a specialist flavour of >> graph data. Something like an 'interest graph' (a >> http://en.wikipedia.org/wiki/Bipartite_graph) where the nodes are >> items or users, and the affinities/associations are indications of >> interest (possibly date-stamped, possibly weighted). >> >> I work a lot with factual graph data expressed in W3C RDF form; in >> this case our 'graph' has nodes that are entities or atomic values, >> and links that are different typed links, representing relationship >> types, attributes/properties etc. Depending on the task in hand this >> can be consumed in Mahout by munging it into recommendations format >> input, or as with CSV input, into vectors, etc. So again it's 'graph' >> data processing even if the processing paradigm isn't from graph >> theory. >> >> Finally the spectral clustering piece of Mahout also takes graph input >> (affinities) and there are decades of research papers that account for >> this in terms of eigenvectors/values of laplacian representations of >> the graph affinity matrix; so I'd also count that as a Mahout tool for >> (I guess 'lossy' in Jake's terminology) graph processing. >> >> Or am I being too marketing-minded here? Is it fair to say "Mahout is >> a toolkit that can do specific useful things with various forms of >> graph-shaped data, but isn't a general-purpose graph processing >> environment"? >> >> Dan > -------------------------------------------- Grant Ingersoll http://www.lucidimagination.com
