Regarding the graph code I would suggest to just remove it. I doubt anyone will continue to work on that branch.
--sebastian On 02.11.2011 14:38, Grant Ingersoll wrote: > Perhaps it would make sense to move them to a branch? I know we never > released them, but it seems a shame for them to be buried in SVN history. > > Perhaps we should have an "attic" branch or a "sandbox" branch, where things > like this (and Watchmaker) can go to age w/o necessarily being relegated to > the big bitbucket in the sky that is previous revisions. I suspect it will > be easier for someone to pick up should things improve later than having to > dig through SVN history. Besides, one might be careful about drawing > conclusions on performance just yet given the state of Hadoop. As the > overhead issues get worked through, some of this stuff may not be as bad. I > guess the question is, is it the paradigm that is slow or the implementation > of the paradigm? (That being said, I do think it's likely the case that this > stuff moves to Giraph) > > > On Nov 2, 2011, at 8:49 AM, Sebastian Schelter wrote: > >> I was refering to the class of "classical" graph algorithms (like >> shortest path, min cut, betweeness, triangle enumeration, etc) that Jake >> was also talking about. >> >> --sebastian >> >> On 02.11.2011 13:13, Dan Brickley wrote: >>> On 2 November 2011 10:24, Sebastian Schelter <[email protected]> wrote: >>>> As you might know I recently started an experimental graph mining >>>> module. I was already concerned at the beginning of this whether >>>> MapReduce is really a suitable platform for (most) graph algorithms. >>>> >>>> I'm not content with the performance of the algorithms after some >>>> testing and I'm pretty sure the future of large scale graph processing >>>> is not on MapReduce (but hopefully on a Pregel like platform such as >>>> Giraph). >>>> >>>> As we're currently removing clutter and trying to concentrate on the >>>> core algorithms, I suggest to remove all graph algorithms with the >>>> exception of PageRank. >>>> >>>> If no one objects with this, I'll start the cleanup in a few days. >>> >>> It all depends what you mean by 'graph algorithms', as Jake more or >>> less says. I take your point re shortest paths etc. However it would >>> be a mistake I think to send out a message that Mahout isn't good for >>> consuming graph data, even while Hadoop certainly has issues with some >>> kinds of graph-processing. >>> >>> All this can be something of a matter of perspective and descriptive >>> gloss. Much of the work of the recommender / Taste component of Mahout >>> can be thought of (and marketed as?) consuming a specialist flavour of >>> graph data. Something like an 'interest graph' (a >>> http://en.wikipedia.org/wiki/Bipartite_graph) where the nodes are >>> items or users, and the affinities/associations are indications of >>> interest (possibly date-stamped, possibly weighted). >>> >>> I work a lot with factual graph data expressed in W3C RDF form; in >>> this case our 'graph' has nodes that are entities or atomic values, >>> and links that are different typed links, representing relationship >>> types, attributes/properties etc. Depending on the task in hand this >>> can be consumed in Mahout by munging it into recommendations format >>> input, or as with CSV input, into vectors, etc. So again it's 'graph' >>> data processing even if the processing paradigm isn't from graph >>> theory. >>> >>> Finally the spectral clustering piece of Mahout also takes graph input >>> (affinities) and there are decades of research papers that account for >>> this in terms of eigenvectors/values of laplacian representations of >>> the graph affinity matrix; so I'd also count that as a Mahout tool for >>> (I guess 'lossy' in Jake's terminology) graph processing. >>> >>> Or am I being too marketing-minded here? Is it fair to say "Mahout is >>> a toolkit that can do specific useful things with various forms of >>> graph-shaped data, but isn't a general-purpose graph processing >>> environment"? >>> >>> Dan >> > > -------------------------------------------- > Grant Ingersoll > http://www.lucidimagination.com > > > >
