I didn't have the time to try it yet, but I'm really interested in it. As far as I know it's an implementation of Google's Pregel paper. One of my current research goals is to implement a bunch of graph algorithms on top of M/R to get a feeling which properties of the underlying system would need to be changed to make the algorithm faster and easier implementable. Maybe Pregel/Giraph is already the answer to that.

I'd be very open towards playing with such a system but as far as integration in Mahout goes, it's a very tough question what other systems should be supported and how we would proceed to integrate them. From my experience its already hard enough for a lot of users to get our hadoop code running...

--sebastian

On 05.09.2011 06:42, Jake Mannix wrote:
Hey gang,

   Has anyone here played much with
Giraph<http://incubator.apache.org/giraph/>(currently now in the
Apache Incubator)?  One of my co-workers ran it on our
corporate Hadoop cluster this past weekend, and found it did a very fast
PageRank computation (far faster than even well-tuned M/R code on the same
data), and it worked pretty close to out-of-the box.  Seems like that style
of computation (in-memory distributed datasets), as used by Giraph (and the
recently-discussed-on-this-list GraphLab<http://graphlab.org/>, and
Spark<http://www.spark-project.org/>, and
Twister<http://www.iterativemapreduce.org/>, and Vowpal
Wabbit<http://hunch.net/~vw/>,
and probably a few others) is more and more the way to go for a lot of the
things we want to do - scalable machine learning.  "RAM is the new Disk, and
Disk is the new Tape" after all...

   Giraph in particular seems nice, in that it runs on top of "old fashioned"
Hadoop - it takes up (long-lived) Mapper slots on your regular cluster,
spins up a ZK cluster if you don't supply the location of one, and is all in
java (which may be a minus, for some people, I guess, but having to run some
big exec'ed out C++ code (GraphLab, VW), or run on-top of (admittedly
awesome) Mesos (Spark [which while running on the JVM, is also in Scala]),
or run its own totally custom inter-server communication and data structures
(Twister and many of the others)).

   Seems we should be not just supportive of this kind of thing, but try and
find some common ground and integration points.

   -jake


Reply via email to