12 gigs, it uses several more (up to 10?) times the memory than the dataset size.
2012/10/24 Shuo Wang <ecisp.wangs...@gmail.com> > How large your data is? Our cluster has 10 nodes, 45 tasks, each task has > 512M memory. But when I run the 200M data, it has OUTOFMEMORY failure. > > 2012/10/24 Thomas Jungblut <thomas.jungb...@gmail.com> > > > Sure it does run, if you have enough ram ;) > > > > 2012/10/24 Shuo Wang <ecisp.wangs...@gmail.com> > > > > > How much data have you run the pagerank on HAMA? Does it run? I want to > > run > > > large data for pagerank on HAMA, but it always fails. > > > > > > 2012/10/24 Thomas Jungblut <thomas.jungb...@gmail.com> > > > > > > > Yes it works on any directed graph. > > > > The best format to use is > > > > > > > > Vertex <\t> AdjacentVertex1 <\n> AdjacentVertex2 etc. > > > > > > > > So you have a adjacency list, and a vertex is represented by each > line. > > > > This is splittable, which the web-google dataset is not. > > > > > > > > 2012/10/24 Shuo Wang <ecisp.wangs...@gmail.com> > > > > > > > > > Thanks! Does the pagerank work on any web graph? I generate a > random > > > web > > > > > graph just like the data type of web-Google.txt, but the result is > > > > > infinity. > > > > > > > > > > 2012/10/24 Thomas Jungblut <thomas.jungb...@gmail.com> > > > > > > > > > > > Because graph iterations != supersteps. You have to take the > > > > partitioning > > > > > > into account, the time to accumulate the number of vertices. > > Pagerank > > > > > > requires an additional superstep to run aggregators. > > > > > > > > > > > > 2012/10/24 Shuo Wang <ecisp.wangs...@gmail.com> > > > > > > > > > > > > > Hi, > > > > > > > > > > > > > > I have run the pagerank on HAMA, I set the max iteration to 20, > > but > > > > it > > > > > > run > > > > > > > 48 supersteps. Why? > > > > > > > > > > > > > > > > > > > > > > > > > > > >