Yes, you will need a lot of ram, until we get out-of-core partitions
and/or out-of-core messages. Do you really need to load all 4 TB of
data? The vertex index, vertex value, edge value, and message value
objects all take up space as well as the data structures to store them
(hence your estimates are definitely too low). How big is the actual
graph that you are trying to analyze in terms of vertices and edges?
On 2/19/12 10:45 PM, yavuz gokirmak wrote:
I am trying to estimate minimum requirements to process graph analysis
over my input data,
In shortest path example it is said that
"The first thing that happens is that getSplits() is called by the
master and then the workers will process the InputSplit objects with
the VertexReader to load their portion of the graph into memory"
What I undestood is in a time T all graph nodes must be loaded on
If I have 100 gb of graph data, will I need 25 machines having 4 gb
If this is the case I have a big memory problem to anaylze 4tb data :)