Fantastic! Thanks a lot! Sent from mobile device. Please excuse typos-terseness.
> On Jul 14, 2014, at 4:42 AM, Tiago de Paula Peixoto <[email protected]> wrote: > >> On 07/14/2014 07:11 AM, Helen Lampesis wrote: >> Dear graphErs, >> >> I am about to start a project working on a graph with about 50M Nodes >> and 1B edges and I want your opinion regarding the feasibility of this >> endeavor with graph_tool. >> >> Can you please share your experience with graph examples that were >> comparable in size? >> >> I mostly want to calculate centrality measures and I will need to >> apply (several) filters to isolate nodes with particular attributes. >> >> On top of it, I am running on a super-computer (so memory is NOT an >> issue) and if I am lucky they have installed/enabled the parallel >> version of the library. > > You should be able to tackle graphs of this size, if you have enough > memory. For centrality calculations, graph-tool has pure-C++ > parallel code, so you should see some good performance. > > Graph filtering can also be done without involving python loops, so it > should scale well as well. > > Just as an illustration, for the graph size you suggested: > > In [1]: g = random_graph(50000000, lambda: poisson(40), random=False, > directed=False) > In [2]: %time pagerank(g) > CPU times: user 3min 26s, sys: 44 s, total: 4min 10s > Wall time: 11.3 s > > So, pagerank takes about 11 seconds on a machine with 32 cores (it would > have taken around 3-4 minutes in a single thread). And it takes about 50 > GB of ram to store the graph. > > Best, > Tiago > > -- > Tiago de Paula Peixoto <[email protected]> > > _______________________________________________ > graph-tool mailing list > [email protected] > http://lists.skewed.de/mailman/listinfo/graph-tool _______________________________________________ graph-tool mailing list [email protected] http://lists.skewed.de/mailman/listinfo/graph-tool
