No worries. I'd like to add, that, as Haiko suggests, the non-hierarchical SBM is much quicker and needs magnitudes less RAM. So it might also run on a recently bought laptop at very acceptable running times, even at your network scale. But not sure about how network density might affect this.
Cheers, Felix > On Friday, Jun 14, 2019 at 11:59 AM, Jean Christophe Cazes > <[email protected] (mailto:[email protected])> > wrote: > Thank you so much for those quick answers ! > Le ven. 14 juin 2019 à 11:57, Felix Victor Münch <[email protected] > (mailto:[email protected])> a écrit : > > Hi Jean, > > > > > > I did a hierarchical SBM on a Twitter follow network (pretty sparse > > actually in comparison to other kinds of networks) with ca. 250k accounts. > > It took about "11 days for one run, with 60 vCPUs and a peak usage of ca. > > 400 GB RAM" (my PhD thesis, https://eprints.qut.edu.au/125543/, p. 251). > > How useful it is theoretically depends on your goals and discipline. In my > > case, feel free to read the referenced chapter and decide for yourself. > > > > I also did runs on smaller networks with ca 100k -150k nodes. Running time > > was still about in the same ball park, or even longer … I guess because the > > network structure was more complex. > > > > The amount of RAM is pretty mandatory. Swap memory won't help you much, as > > it slows the algorithm down to a degree that makes the running time > > infeasible. Number of CPUs could be lower I guess, because much of the algo > > seemed to run serially and those parts made up most of the calculation time. > > > > I had a university/QRIScloud (https://www.qriscloud.org.au/) provided VM > > with 60 cores and 900 GB RAM. On a Google VM this would have been pretty > > costly. I adjusted epsilon for less accuracy and greater speed: > > > > > state = gt.inference.minimize_nested_blockmodel_dl(core, verbose=True, > > > mcmc_equilibrate_args={'epsilon': 1e-2}, ) > > (https://github.com/FlxVctr/PhD-code/blob/master/1000%2B%20nested%20SBM.ipynb) > > > > If I wouldn't have had the computing ressources for free I wouldn't have > > done it. > > > > I'd recommend to test infomap if you're looking for a more efficient > > alternative (https://www.mapequation.org/index.html) that also works with > > entropy minimization (even though it's more flow oriented). Just did a 181k > > network community detection (non-hierarchical) in a matter of seconds on a > > last-year's Macbook Pro yesterday. I don't know how long it takes for a > > hierarchical structure, but it can do this, so it's worth a try. > > > > Also efficient, but with all the drawbacks that Tiago Peixoto elaborates on > > in his papers (which I also refer to in the chapter linked above), and not > > hierarchical, is the parallelised modularity maximisation (Parallelised > > Louvain Method) PLM in NetworKit (https://networkit.github.io/). Despite > > it's theoretical and statistical drawbacks it delivers good heuristical > > evidence for communities in networks imho. But that depends a lot on what > > you want to do > > > > > > Cheers, > > > > > > Felix > > > > > > > > Dr. Felix Victor Münch Postdoc Researcher > > Leibniz Institut for Media Research | Hans-Bredow-Institut (HBI), Hamburg > > https://leibniz-hbi.de/ > > https://felixvictor.net (https://felixvictor.net/) > > > > > On Friday, Jun 14, 2019 at 10:16 AM, Lietz Haiko <[email protected] > > > (mailto:[email protected])> wrote: > > > Hi Jean, > > > > > > the answer also depends how complicated the desired SBM is. A layered > > > model takes longer than an unlayered one. > > > > > > Modeling a graph with 100k nodes should take very long. But I'd also be > > > interested in a more informed answer... > > > > > > Haiko > > > > > > > > > > > > Von: graph-tool [[email protected] > > > (mailto:[email protected])]" im Auftrag von "Jean Christophe > > > Cazes [[email protected] > > > (mailto:[email protected])] > > > Gesendet: Freitag, 14. Juni 2019 09:59 > > > An: [email protected] (mailto:[email protected]) > > > Betreff: [graph-tool] [SBM on Dense Graphs] > > > > > > > > > Hello, I intend to use graph_tool for a big network, +100k nodes and very > > > dense. > > > > > > > > > The dataset i'm working with at the moment is ~ 40/50 GB csv containing > > > vertices and edges as transactions. > > > > > > > > > Is it realistic to try SBM on such graph both computationnally and would > > > this be theoretically useful? > > > > > > > > > If it isnt computationnally, how big can my subgraph be in order to be > > > feasible? > > > > > > > > > Note: I will rent a Google Cloud Platform VM to do so. > > > > > > > > > Thank you > > > > > > > > > > > > > > > > > > _______________________________________________ > > > graph-tool mailing list > > > [email protected] (mailto:[email protected]) > > > https://lists.skewed.de/mailman/listinfo/graph-tool > > _______________________________________________ > > graph-tool mailing list > > [email protected] (mailto:[email protected]) > > https://lists.skewed.de/mailman/listinfo/graph-tool > _______________________________________________ > graph-tool mailing list > [email protected] > https://lists.skewed.de/mailman/listinfo/graph-tool
_______________________________________________ graph-tool mailing list [email protected] https://lists.skewed.de/mailman/listinfo/graph-tool
