Hi, I'd like to ask whether someone is currently working on out-of-core messaging for Giraph (e.g. by spilling messages to disk in case of memory pressure).
I ran some experiments with Giraph on a small 6-machine cluster and got really nice results for smaller datasets such as the wikipedia pagelink graph (6M vertices, ~250M edges in its undirected version). For larger graphs with a even more skewed degree distribution such as the twitter follower graph from [1], Giraph crashes in the first superstep unfortunately. My colleagues observed the same, when they ran benchmarks of Giraph against the Stratosphere system [2], where Giraph did kind of well for small datasets, but again crashed for larger ones... I think the lack of out-of-core messages is currently the biggest obstacle to recommending people to test Giraph in production use. Best, Sebastian [1] http://konect.uni-koblenz.de/networks/twitter [2] http://www.stratosphere.eu/
