Hi, I've been struggling with a similar problem and that's why i've started working on the out-of-core message management, when the memory shrinks. Your particular problem can get to an upper bound of exponential space complexity, which you experience with the OOM. the possible paths you can extract for each source vertex are about O(d^l), where d is the average degree of the graph and l is the max length of the extracted path (if you do shortest paths, then it's the diameter).

## Advertising

Giraph is all good and fast but it's all in-memory and for this reason it currently lacks a solution to your problem. I suggest you wait until GIRAPH-45 is ready (I should write an email tonight about that patch). Hope it makes sense to you, Claudio On Mon, Jan 23, 2012 at 10:56 AM, André Kelpe <efeshundert...@googlemail.com> wrote: > Hi list, > > I have been investigating giraph for a week now and I have a huge stability > problem. I am running the trunk against a CDH3u2 hadoop cluster. The problem > I am trying to solve goes as follows: > > In a graph there are 3 kinds of vertices: > > 1: end of the world vertices, which have only 1 connected edge > 2: bivalent vertices, which have exactly 2 connected edges > 3: multivalent vertices that have n-edges connected (n>2) > > The goal is now to calculate for each vertex that is not in category 2 all > the paths to the other reachable non bivalent vertices. One could say all > pathes between all non-bivalent vertices. > > To give an example: > > [9] > | > | > <12> > | > | > [5]-----<13>-----[6]-----<11>-----[7]-----<10>-----[8] > > In this path I want to know that [5] forms a path to [7] via the edges <13> > and <11>. [7] forms a path via <12> with [9], via <10> with [8] and via > <11><13> with [5]. You get the idea... Directionality is not important. > > The algorithm I have is pretty straight forward, in superstep 0 all vertices > that are non-bivalent send a message to their neighbours via which edge they > are reachable. In all following supersteps the bivalent vertices are simply > forwarding this information and the non-bivalent ones are terminating the > algorithm. The messages that they sent are made using Textwritable instances > encoding the path. > > If I run this algorithm on input with 1 million edges it never finishes, the > master process and then the others always go out of memory, even with a 10GB > heap. I know that java programs can be memory hungry, but 10GB heap for > processing a 40MB input file is a bit to much in book. I have tried all sorts > of settings, like disabling checkpoints, but nothing makes it finish. I also > see a slowdown in processing, the first 20ish supersteps are done in no time, > but then the processing slows down until it crashes in superstep 47. > > My questions are: What am I doing wrong? Do you guys have any pointers for > things to look after? How can I get this thing to finish? Why is it so memory > hungry? > > Thanks a lot for your help! > > -André -- Claudio Martella claudio.marte...@gmail.com