Hi again, I am trying to estimate minimum requirements to process graph analysis over my input data,
In shortest path example it is said that "The first thing that happens is that getSplits() is called by the master and then the workers will process the InputSplit objects with the VertexReader to load their portion of the graph into memory" What I undestood is in a time T all graph nodes must be loaded on cluster memory. If I have 100 gb of graph data, will I need 25 machines having 4 gb ram each? If this is the case I have a big memory problem to anaylze 4tb data :) best regards.