I'd like to ask whether someone is currently working on out-of-core
messaging for Giraph (e.g. by spilling messages to disk in case of
memory pressure).

I ran some experiments with Giraph on a small 6-machine cluster and got
really nice results for smaller datasets such as the wikipedia pagelink
graph (6M vertices, ~250M edges in its undirected version).

For larger graphs with a even more skewed degree distribution such as
the twitter follower graph from [1], Giraph crashes in the first
superstep unfortunately. My colleagues observed the same, when they ran
benchmarks of Giraph against the Stratosphere system [2], where Giraph
did kind of well for small datasets, but again crashed for larger ones...

I think the lack of out-of-core messages is currently the biggest
obstacle to recommending people to test Giraph in production use.


[1] http://konect.uni-koblenz.de/networks/twitter
[2] http://www.stratosphere.eu/

