I'd like to ask whether someone is currently working on out-of-core
messaging for Giraph (e.g. by spilling messages to disk in case of
I ran some experiments with Giraph on a small 6-machine cluster and got
really nice results for smaller datasets such as the wikipedia pagelink
graph (6M vertices, ~250M edges in its undirected version).
For larger graphs with a even more skewed degree distribution such as
the twitter follower graph from , Giraph crashes in the first
superstep unfortunately. My colleagues observed the same, when they ran
benchmarks of Giraph against the Stratosphere system , where Giraph
did kind of well for small datasets, but again crashed for larger ones...
I think the lack of out-of-core messages is currently the biggest
obstacle to recommending people to test Giraph in production use.