Re: Goodbye graph algorithms

Sebastian Schelter Wed, 02 Nov 2011 05:50:07 -0700

I was refering to the class of "classical" graph algorithms (like
shortest path, min cut, betweeness, triangle enumeration, etc) that Jake
was also talking about.


--sebastian

On 02.11.2011 13:13, Dan Brickley wrote:
> On 2 November 2011 10:24, Sebastian Schelter <[email protected]> wrote:
>> As you might know I recently started an experimental graph mining
>> module. I was already concerned at the beginning of this whether
>> MapReduce is really a suitable platform for (most) graph algorithms.
>>
>> I'm not content with the performance of the algorithms after some
>> testing and I'm pretty sure the future of large scale graph processing
>> is not on MapReduce (but hopefully on a Pregel like platform such as
>> Giraph).
>>
>> As we're currently removing clutter and trying to concentrate on the
>> core algorithms, I suggest to remove all graph algorithms with the
>> exception of PageRank.
>>
>> If no one objects with this, I'll start the cleanup in a few days.
> 
> It all depends what you mean by 'graph algorithms', as Jake more or
> less says. I take your point re shortest paths etc. However it would
> be a mistake I think to send out a message that Mahout isn't good for
> consuming graph data, even while Hadoop certainly has issues with some
> kinds of graph-processing.
> 
> All this can be something of a matter of perspective and descriptive
> gloss. Much of the work of the recommender / Taste component of Mahout
> can be thought of (and marketed as?) consuming a specialist flavour of
> graph data. Something like an 'interest graph' (a
> http://en.wikipedia.org/wiki/Bipartite_graph) where the nodes are
> items or users, and the affinities/associations are indications of
> interest (possibly date-stamped, possibly weighted).
> 
> I work a lot with factual graph data expressed in W3C RDF form; in
> this case our 'graph' has nodes that are entities or atomic values,
> and links that are different typed links, representing relationship
> types, attributes/properties etc.  Depending on the task in hand this
> can be consumed in Mahout by munging it into recommendations format
> input, or as with CSV input, into vectors, etc. So again it's 'graph'
> data processing even if the processing paradigm isn't from graph
> theory.
> 
> Finally the spectral clustering piece of Mahout also takes graph input
> (affinities) and there are decades of research papers that account for
> this in terms of eigenvectors/values of laplacian representations of
> the graph affinity matrix; so I'd also count that as a Mahout tool for
> (I guess 'lossy' in Jake's terminology) graph processing.
> 
> Or am I being too marketing-minded here? Is it fair to say "Mahout is
> a toolkit that can do specific useful things with various forms of
> graph-shaped data, but isn't a general-purpose graph processing
> environment"?
> 
> Dan

Re: Goodbye graph algorithms

Reply via email to