I haven't investigated too deeply into this. . .but is there a caching strategy implemented, or in the works, for getting around having to load all of a split's vertices into memory? If a graph is large enough, even a reasonably sized cluster may not have enough memory to load all the vertices. Does Giraph address this currently?
-David
