Re: Goodbye graph algorithms

Sebastian Schelter Wed, 02 Nov 2011 07:45:55 -0700

Regarding the graph code I would suggest to just remove it. I doubt
anyone will continue to work on that branch.


--sebastian

On 02.11.2011 14:38, Grant Ingersoll wrote:
> Perhaps it would make sense to move them to a branch?  I know we never 
> released them, but it seems a shame for them to be buried in SVN history.
> 
> Perhaps we should have an "attic" branch or a "sandbox" branch, where things 
> like this (and Watchmaker) can go to age w/o necessarily being relegated to 
> the big bitbucket in the sky that is previous revisions.  I suspect it will 
> be easier for someone to pick up should things improve later than having to 
> dig through SVN history.  Besides, one might be careful about drawing 
> conclusions on performance just yet given the state of Hadoop.  As the 
> overhead issues get worked through, some of this stuff may not be as bad.   I 
> guess the question is, is it the paradigm that is slow or the implementation 
> of the paradigm?  (That being said, I do think it's likely the case that this 
> stuff moves to Giraph)
> 
> 
> On Nov 2, 2011, at 8:49 AM, Sebastian Schelter wrote:
> 
>> I was refering to the class of "classical" graph algorithms (like
>> shortest path, min cut, betweeness, triangle enumeration, etc) that Jake
>> was also talking about.
>>
>> --sebastian
>>
>> On 02.11.2011 13:13, Dan Brickley wrote:
>>> On 2 November 2011 10:24, Sebastian Schelter <[email protected]> wrote:
>>>> As you might know I recently started an experimental graph mining
>>>> module. I was already concerned at the beginning of this whether
>>>> MapReduce is really a suitable platform for (most) graph algorithms.
>>>>
>>>> I'm not content with the performance of the algorithms after some
>>>> testing and I'm pretty sure the future of large scale graph processing
>>>> is not on MapReduce (but hopefully on a Pregel like platform such as
>>>> Giraph).
>>>>
>>>> As we're currently removing clutter and trying to concentrate on the
>>>> core algorithms, I suggest to remove all graph algorithms with the
>>>> exception of PageRank.
>>>>
>>>> If no one objects with this, I'll start the cleanup in a few days.
>>>
>>> It all depends what you mean by 'graph algorithms', as Jake more or
>>> less says. I take your point re shortest paths etc. However it would
>>> be a mistake I think to send out a message that Mahout isn't good for
>>> consuming graph data, even while Hadoop certainly has issues with some
>>> kinds of graph-processing.
>>>
>>> All this can be something of a matter of perspective and descriptive
>>> gloss. Much of the work of the recommender / Taste component of Mahout
>>> can be thought of (and marketed as?) consuming a specialist flavour of
>>> graph data. Something like an 'interest graph' (a
>>> http://en.wikipedia.org/wiki/Bipartite_graph) where the nodes are
>>> items or users, and the affinities/associations are indications of
>>> interest (possibly date-stamped, possibly weighted).
>>>
>>> I work a lot with factual graph data expressed in W3C RDF form; in
>>> this case our 'graph' has nodes that are entities or atomic values,
>>> and links that are different typed links, representing relationship
>>> types, attributes/properties etc.  Depending on the task in hand this
>>> can be consumed in Mahout by munging it into recommendations format
>>> input, or as with CSV input, into vectors, etc. So again it's 'graph'
>>> data processing even if the processing paradigm isn't from graph
>>> theory.
>>>
>>> Finally the spectral clustering piece of Mahout also takes graph input
>>> (affinities) and there are decades of research papers that account for
>>> this in terms of eigenvectors/values of laplacian representations of
>>> the graph affinity matrix; so I'd also count that as a Mahout tool for
>>> (I guess 'lossy' in Jake's terminology) graph processing.
>>>
>>> Or am I being too marketing-minded here? Is it fair to say "Mahout is
>>> a toolkit that can do specific useful things with various forms of
>>> graph-shaped data, but isn't a general-purpose graph processing
>>> environment"?
>>>
>>> Dan
>>
> 
> --------------------------------------------
> Grant Ingersoll
> http://www.lucidimagination.com
> 
> 
> 
>

Re: Goodbye graph algorithms

Reply via email to