I definitely agree with you on this one.
I'm currently working on it, but I'm kind of stuck with a small bug to
be accounted to some concurrency we can't understand (I have a 4
liners that can reproduce it, if you want to help out). Avery and I
are currently discussing on the possibility to write a paper on the
solution, so hopefully I should be able to let you know better in a
couple of weeks.
On Thu, May 3, 2012 at 3:44 PM, Sebastian Schelter <s...@apache.org> wrote:
> I'd like to ask whether someone is currently working on out-of-core
> messaging for Giraph (e.g. by spilling messages to disk in case of
> memory pressure).
> I ran some experiments with Giraph on a small 6-machine cluster and got
> really nice results for smaller datasets such as the wikipedia pagelink
> graph (6M vertices, ~250M edges in its undirected version).
> For larger graphs with a even more skewed degree distribution such as
> the twitter follower graph from , Giraph crashes in the first
> superstep unfortunately. My colleagues observed the same, when they ran
> benchmarks of Giraph against the Stratosphere system , where Giraph
> did kind of well for small datasets, but again crashed for larger ones...
> I think the lack of out-of-core messages is currently the biggest
> obstacle to recommending people to test Giraph in production use.
>  http://konect.uni-koblenz.de/networks/twitter
>  http://www.stratosphere.eu/