The *point* of the Pregel architecture (which Giraph is an implementation
of) is that the whole graph is in (distributed) memory. If you are willing
to go to disk, doing your calculations via MapReduce (possibly talking to a
distributed hashtable of some kind colocated with your hadoop cluster, if
it helps) is the straightforward way to go.
On Tue, Jan 31, 2012 at 9:34 PM, David Garcia <dgar...@potomacfusion.com>wrote:
> I haven't investigated too deeply into this. . .but is there a caching
> strategy implemented, or in the works, for getting around having to load
> all of a split's vertices into memory? If a graph is large enough, even a
> reasonably sized cluster may not have enough memory to load all the
> vertices. Does Giraph address this currently?