Wow, i'll try and give my results and computing time if you want :) Thank you for providing answers and for graph_tool !
Le ven. 14 juin 2019 à 13:46, Tiago de Paula Peixoto <[email protected]> a écrit : > Hi, > > The important quantity is the number of edges E, not the number of nodes. > > The functions minimize_nested_blockmodel_dl() and minimize_blockmodel_dl() > have a complexity of O(E log² N), but the nested version takes longer. > > As was already mentioned, the nested model has a large memory footprint, > since several state copies need to be kept. > > There is an alternative approach that takes far less memory, which is to > instantiate a NestedBlockState object and just call multiflip_mcmc_sweep() > until equilibration. This is very competitive with the functions above, but > tend to finish much quicker and use less memory. > > This is covered in the HOWTO: > > > https://graph-tool.skewed.de/static/doc/demos/inference/inference.html#id15 > > (Remember to replace mcmc_sweep() with multiflip_mcmc_sweep() in the code > above. I had forgotten to make this change the last release...) > > Best, > Tiago > > > Am 14.06.19 um 12:06 schrieb Felix Victor Münch: > > No worries. I'd like to add, that, as Haiko suggests, the > non-hierarchical > > SBM is much quicker and needs magnitudes less RAM. So it might also run > on a > > recently bought laptop at very acceptable running times, even at your > > network scale. But not sure about how network density might affect this. > > > > Cheers, > > > > Felix > > > > > > > > On Friday, Jun 14, 2019 at 11:59 AM, Jean Christophe Cazes > > <[email protected] > > <mailto:[email protected]>> wrote: > > Thank you so much for those quick answers ! > > > > Le ven. 14 juin 2019 à 11:57, Felix Victor Münch > > <[email protected] <mailto:[email protected]>> a > écrit : > > > > Hi Jean, > > > > > > I did a hierarchical SBM on a Twitter follow network (pretty > sparse > > actually in comparison to other kinds of networks) with ca. 250k > > accounts. It took about "11 days for one run, with 60 vCPUs and a > > peak usage of ca. 400 GB RAM" (my PhD > > thesis, https://eprints.qut.edu.au/125543/, p. 251). How useful > it > > is theoretically depends on your goals and discipline. In my > case, > > feel free to read the referenced chapter and decide for yourself. > > > > I also did runs on smaller networks with ca 100k -150k nodes. > > Running time was still about in the same ball park, or even > longer … > > I guess because the network structure was more complex. > > > > The amount of RAM is pretty mandatory. Swap memory won't help you > > much, as it slows the algorithm down to a degree that makes the > > running time infeasible. Number of CPUs could be lower I guess, > > because much of the algo seemed to run serially and those parts > made > > up most of the calculation time. > > > > I had a university/QRIScloud (https://www.qriscloud.org.au/) > > provided VM with 60 cores and 900 GB RAM. On a Google VM this > would > > have been pretty costly. I adjusted epsilon for less accuracy and > > greater speed: > > > > state = gt.inference.minimize_nested_blockmodel_dl(core, > verbose=True, mcmc_equilibrate_args={'epsilon': 1e-2}, ) > > > > ( > https://github.com/FlxVctr/PhD-code/blob/master/1000%2B%20nested%20SBM.ipynb > ) > > > > If I wouldn't have had the computing ressources for free I > wouldn't > > have done it. > > > > I'd recommend to test infomap if you're looking for a more > efficient > > alternative (https://www.mapequation.org/index.html) that also > works > > with entropy minimization (even though it's more flow oriented). > > Just did a 181k network community detection (non-hierarchical) > in a > > matter of seconds on a last-year's Macbook Pro yesterday. I don't > > know how long it takes for a hierarchical structure, but it can > do > > this, so it's worth a try. > > > > Also efficient, but with all the drawbacks that Tiago Peixoto > > elaborates on in his papers (which I also refer to in the chapter > > linked above), and not hierarchical, is the parallelised > modularity > > maximisation (Parallelised Louvain Method) PLM in NetworKit > > (https://networkit.github.io/). Despite it's theoretical and > > statistical drawbacks it delivers good heuristical evidence for > > communities in networks imho. But that depends a lot on what you > > want to do > > > > > > Cheers, > > > > > > Felix > > > > > > > > *Dr. Felix Victor Münch* > > Postdoc Researcher > > Leibniz Institut for Media Research | Hans-Bredow-Institut (HBI), > > Hamburg > > https://leibniz-hbi.de/ > > https://felixvictor.net <https://felixvictor.net/> > > > > > > On Friday, Jun 14, 2019 at 10:16 AM, Lietz Haiko > > <[email protected] <mailto:[email protected]>> > wrote: > > Hi Jean, > > > > the answer also depends how complicated the desired SBM is. A > > layered model takes longer than an unlayered one. > > > > Modeling a graph with 100k nodes should take very long. But > I'd > > also be interested in a more informed answer... > > > > Haiko > > > > > > > > > ---------------------------------------------------------------------------- > > *Von:* graph-tool [[email protected] > > <mailto:[email protected]>]" im Auftrag von "Jean > > Christophe Cazes [[email protected] > > <mailto:[email protected]>] > > *Gesendet:* Freitag, 14. Juni 2019 09:59 > > *An:* [email protected] <mailto:[email protected]> > > *Betreff:* [graph-tool] [SBM on Dense Graphs] > > > > Hello, I intend to use graph_tool for a big network, +100k > nodes > > and very dense. > > > > The dataset i'm working with at the moment is ~ 40/50 GB csv > > containing vertices and edges as transactions. > > > > Is it realistic to try SBM on such graph both > computationnally > > and would this be theoretically useful? > > > > If it isnt computationnally, how big can my subgraph be in > order > > to be feasible? > > > > Note: I will rent a Google Cloud Platform VM to do so. > > > > Thank you > > > > _______________________________________________ > > graph-tool mailing list > > [email protected] <mailto:[email protected]> > > https://lists.skewed.de/mailman/listinfo/graph-tool > > > > _______________________________________________ > > graph-tool mailing list > > [email protected] <mailto:[email protected]> > > https://lists.skewed.de/mailman/listinfo/graph-tool > > > > _______________________________________________ > > graph-tool mailing list > > [email protected] > > https://lists.skewed.de/mailman/listinfo/graph-tool > > > > > > _______________________________________________ > > graph-tool mailing list > > [email protected] > > https://lists.skewed.de/mailman/listinfo/graph-tool > > > > > -- > Tiago de Paula Peixoto <[email protected]> > > _______________________________________________ > graph-tool mailing list > [email protected] > https://lists.skewed.de/mailman/listinfo/graph-tool >
_______________________________________________ graph-tool mailing list [email protected] https://lists.skewed.de/mailman/listinfo/graph-tool
