Hey Jake, thx for the reply.  I'll look at GIRAPH-45 for this particular topic. 
 Really quick though, I thought that Pregel was an implementation of BSP (a 
programming model. . .completely orthogonal from the manner in which data is 
retrieved/stored).  It seems quite reasonable to implement a basic caching 
strategy in the case all vertices don't fit in memory for a particular worker.  
Thx again for your input.  I'll direct my question to GIRAPH-45 topic.


From: Jake Mannix <jake.man...@gmail.com<mailto:jake.man...@gmail.com>>
Date: Wed, 1 Feb 2012 00:01:02 -0600
To: "giraph-user@incubator.apache.org<mailto:giraph-user@incubator.apache.org>" 
Subject: Re: Caching (with LRU or something) strategy in Giraph?

Hi David,

  The *point* of the Pregel architecture (which Giraph is an implementation of) 
is that the whole graph is in (distributed) memory.  If you are willing to go 
to disk, doing your calculations via MapReduce (possibly talking to a 
distributed hashtable of some kind colocated with your hadoop cluster, if it 
helps) is the straightforward way to go.


On Tue, Jan 31, 2012 at 9:34 PM, David Garcia 
<dgar...@potomacfusion.com<mailto:dgar...@potomacfusion.com>> wrote:
I haven't investigated too deeply into this. . .but is there a caching strategy 
implemented, or in the works, for getting around having to load all of a 
split's vertices into memory?  If a graph is large enough, even a reasonably 
sized cluster may not have enough memory to load all the vertices.  Does Giraph 
address this currently?


Reply via email to