No worries. I'd like to add, that, as Haiko suggests, the non-hierarchical SBM 
is much quicker and needs magnitudes less RAM. So it might also run on a 
recently bought laptop at very acceptable running times, even at your network 
scale. But not sure about how network density might affect this.

Cheers,

Felix

> On Friday, Jun 14, 2019 at 11:59 AM, Jean Christophe Cazes 
> <[email protected] (mailto:[email protected])> 
> wrote:
> Thank you so much for those quick answers !
> Le ven. 14 juin 2019 à 11:57, Felix Victor Münch <[email protected] 
> (mailto:[email protected])> a écrit :
> > Hi Jean,
> >
> >
> > I did a hierarchical SBM on a Twitter follow network (pretty sparse 
> > actually in comparison to other kinds of networks) with ca. 250k accounts. 
> > It took about "11 days for one run, with 60 vCPUs and a peak usage of ca. 
> > 400 GB RAM" (my PhD thesis, https://eprints.qut.edu.au/125543/, p. 251). 
> > How useful it is theoretically depends on your goals and discipline. In my 
> > case, feel free to read the referenced chapter and decide for yourself.
> >
> > I also did runs on smaller networks with ca 100k -150k nodes. Running time 
> > was still about in the same ball park, or even longer … I guess because the 
> > network structure was more complex.
> >
> > The amount of RAM is pretty mandatory. Swap memory won't help you much, as 
> > it slows the algorithm down to a degree that makes the running time 
> > infeasible. Number of CPUs could be lower I guess, because much of the algo 
> > seemed to run serially and those parts made up most of the calculation time.
> >
> > I had a university/QRIScloud (https://www.qriscloud.org.au/) provided VM 
> > with 60 cores and 900 GB RAM. On a Google VM this would have been pretty 
> > costly. I adjusted epsilon for less accuracy and greater speed:
> >
> > > state = gt.inference.minimize_nested_blockmodel_dl(core, verbose=True, 
> > > mcmc_equilibrate_args={'epsilon': 1e-2}, )
> > (https://github.com/FlxVctr/PhD-code/blob/master/1000%2B%20nested%20SBM.ipynb)
> >
> > If I wouldn't have had the computing ressources for free I wouldn't have 
> > done it.
> >
> > I'd recommend to test infomap if you're looking for a more efficient 
> > alternative (https://www.mapequation.org/index.html) that also works with 
> > entropy minimization (even though it's more flow oriented). Just did a 181k 
> > network community detection (non-hierarchical) in a matter of seconds on a 
> > last-year's Macbook Pro yesterday. I don't know how long it takes for a 
> > hierarchical structure, but it can do this, so it's worth a try.
> >
> > Also efficient, but with all the drawbacks that Tiago Peixoto elaborates on 
> > in his papers (which I also refer to in the chapter linked above), and not 
> > hierarchical, is the parallelised modularity maximisation (Parallelised 
> > Louvain Method) PLM in NetworKit (https://networkit.github.io/). Despite 
> > it's theoretical and statistical drawbacks it delivers good heuristical 
> > evidence for communities in networks imho. But that depends a lot on what 
> > you want to do
> >
> >
> > Cheers,
> >
> >
> > Felix
> >
> >
> >
> > Dr. Felix Victor Münch Postdoc Researcher
> > Leibniz Institut for Media Research | Hans-Bredow-Institut (HBI), Hamburg
> > https://leibniz-hbi.de/
> > https://felixvictor.net (https://felixvictor.net/)
> >
> > > On Friday, Jun 14, 2019 at 10:16 AM, Lietz Haiko <[email protected] 
> > > (mailto:[email protected])> wrote:
> > > Hi Jean,
> > >
> > > the answer also depends how complicated the desired SBM is. A layered 
> > > model takes longer than an unlayered one.
> > >
> > > Modeling a graph with 100k nodes should take very long. But I'd also be 
> > > interested in a more informed answer...
> > >
> > > Haiko
> > >
> > >
> > >
> > > Von: graph-tool [[email protected] 
> > > (mailto:[email protected])]" im Auftrag von "Jean Christophe 
> > > Cazes [[email protected] 
> > > (mailto:[email protected])]
> > > Gesendet: Freitag, 14. Juni 2019 09:59
> > > An: [email protected] (mailto:[email protected])
> > > Betreff: [graph-tool] [SBM on Dense Graphs]
> > >
> > >
> > > Hello, I intend to use graph_tool for a big network, +100k nodes and very 
> > > dense.
> > >
> > >
> > > The dataset i'm working with at the moment is ~ 40/50 GB csv containing 
> > > vertices and edges as transactions.
> > >
> > >
> > > Is it realistic to try SBM on such graph both computationnally and would 
> > > this be theoretically useful?
> > >
> > >
> > > If it isnt computationnally, how big can my subgraph be in order to be 
> > > feasible?
> > >
> > >
> > > Note: I will rent a Google Cloud Platform VM to do so.
> > >
> > >
> > > Thank you
> > >
> > >
> > >
> > >
> > >
> > > _______________________________________________
> > > graph-tool mailing list
> > > [email protected] (mailto:[email protected])
> > > https://lists.skewed.de/mailman/listinfo/graph-tool
> > _______________________________________________
> > graph-tool mailing list
> > [email protected] (mailto:[email protected])
> > https://lists.skewed.de/mailman/listinfo/graph-tool
> _______________________________________________
> graph-tool mailing list
> [email protected]
> https://lists.skewed.de/mailman/listinfo/graph-tool
_______________________________________________
graph-tool mailing list
[email protected]
https://lists.skewed.de/mailman/listinfo/graph-tool

Reply via email to