Hi Sebastian, I definitely agree with you on this one.
I'm currently working on it, but I'm kind of stuck with a small bug to be accounted to some concurrency we can't understand (I have a 4 liners that can reproduce it, if you want to help out). Avery and I are currently discussing on the possibility to write a paper on the solution, so hopefully I should be able to let you know better in a couple of weeks. On Thu, May 3, 2012 at 3:44 PM, Sebastian Schelter <[email protected]> wrote: > Hi, > > I'd like to ask whether someone is currently working on out-of-core > messaging for Giraph (e.g. by spilling messages to disk in case of > memory pressure). > > I ran some experiments with Giraph on a small 6-machine cluster and got > really nice results for smaller datasets such as the wikipedia pagelink > graph (6M vertices, ~250M edges in its undirected version). > > For larger graphs with a even more skewed degree distribution such as > the twitter follower graph from [1], Giraph crashes in the first > superstep unfortunately. My colleagues observed the same, when they ran > benchmarks of Giraph against the Stratosphere system [2], where Giraph > did kind of well for small datasets, but again crashed for larger ones... > > I think the lack of out-of-core messages is currently the biggest > obstacle to recommending people to test Giraph in production use. > > Best, > Sebastian > > > [1] http://konect.uni-koblenz.de/networks/twitter > [2] http://www.stratosphere.eu/ -- Claudio Martella [email protected]
