Wow, i'll try and give my results and computing time if you want :)
Thank you for providing answers and for graph_tool !

Le ven. 14 juin 2019 à 13:46, Tiago de Paula Peixoto <[email protected]> a
écrit :

> Hi,
>
> The important quantity is the number of edges E, not the number of nodes.
>
> The functions minimize_nested_blockmodel_dl() and minimize_blockmodel_dl()
> have a complexity of O(E log² N), but the nested version takes longer.
>
> As was already mentioned, the nested model has a large memory footprint,
> since several state copies need to be kept.
>
> There is an alternative approach that takes far less memory, which is to
> instantiate a NestedBlockState object and just call multiflip_mcmc_sweep()
> until equilibration. This is very competitive with the functions above, but
> tend to finish much quicker and use less memory.
>
> This is covered in the HOWTO:
>
>
> https://graph-tool.skewed.de/static/doc/demos/inference/inference.html#id15
>
> (Remember to replace mcmc_sweep() with multiflip_mcmc_sweep() in the code
> above. I had forgotten to make this change the last release...)
>
> Best,
> Tiago
>
>
> Am 14.06.19 um 12:06 schrieb Felix Victor Münch:
> > No worries. I'd like to add, that, as Haiko suggests, the
> non-hierarchical
> > SBM is much quicker and needs magnitudes less RAM. So it might also run
> on a
> > recently bought laptop at very acceptable running times, even at your
> > network scale. But not sure about how network density might affect this.
> >
> > Cheers,
> >
> > Felix
> >
> >
> >
> >     On Friday, Jun 14, 2019 at 11:59 AM, Jean Christophe Cazes
> >     <[email protected]
> >     <mailto:[email protected]>> wrote:
> >     Thank you so much for those quick answers !
> >
> >     Le ven. 14 juin 2019 à 11:57, Felix Victor Münch
> >     <[email protected] <mailto:[email protected]>> a
> écrit :
> >
> >         Hi Jean,
> >
> >
> >         I did a hierarchical SBM on a Twitter follow network (pretty
> sparse
> >         actually in comparison to other kinds of networks) with ca. 250k
> >         accounts. It took about "11 days for one run, with 60 vCPUs and a
> >         peak usage of ca. 400 GB RAM" (my PhD
> >         thesis, https://eprints.qut.edu.au/125543/, p. 251). How useful
> it
> >         is theoretically depends on your goals and discipline. In my
> case,
> >         feel free to read the referenced chapter and decide for yourself.
> >
> >         I also did runs on smaller networks with ca 100k -150k nodes.
> >         Running time was still about in the same ball park, or even
> longer …
> >         I guess because the network structure was more complex.
> >
> >         The amount of RAM is pretty mandatory. Swap memory won't help you
> >         much, as it slows the algorithm down to a degree that makes the
> >         running time infeasible. Number of CPUs could be lower I guess,
> >         because much of the algo seemed to run serially and those parts
> made
> >         up most of the calculation time.
> >
> >         I had a university/QRIScloud (https://www.qriscloud.org.au/)
> >         provided VM with 60 cores and 900 GB RAM. On a Google VM this
> would
> >         have been pretty costly. I adjusted epsilon for less accuracy and
> >         greater speed:
> >
> >             state = gt.inference.minimize_nested_blockmodel_dl(core,
> verbose=True, mcmc_equilibrate_args={'epsilon': 1e-2}, )
> >
> >         (
> https://github.com/FlxVctr/PhD-code/blob/master/1000%2B%20nested%20SBM.ipynb
> )
> >
> >         If I wouldn't have had the computing ressources for free I
> wouldn't
> >         have done it.
> >
> >         I'd recommend to test infomap if you're looking for a more
> efficient
> >         alternative (https://www.mapequation.org/index.html) that also
> works
> >         with entropy minimization (even though it's more flow oriented).
> >         Just did a 181k network community detection (non-hierarchical)
> in a
> >         matter of seconds on a last-year's Macbook Pro yesterday. I don't
> >         know how long it takes for a hierarchical structure, but it can
> do
> >         this, so it's worth a try.
> >
> >         Also efficient, but with all the drawbacks that Tiago Peixoto
> >         elaborates on in his papers (which I also refer to in the chapter
> >         linked above), and not hierarchical, is the parallelised
> modularity
> >         maximisation (Parallelised Louvain Method) PLM in NetworKit
> >         (https://networkit.github.io/). Despite it's theoretical and
> >         statistical drawbacks it delivers good heuristical evidence for
> >         communities in networks imho. But that depends a lot on what you
> >         want to do
> >
> >
> >         Cheers,
> >
> >
> >         Felix
> >
> >
> >
> >         *Dr. Felix Victor Münch*
> >         Postdoc Researcher
> >         Leibniz Institut for Media Research | Hans-Bredow-Institut (HBI),
> >         Hamburg
> >         https://leibniz-hbi.de/
> >         https://felixvictor.net <https://felixvictor.net/>
> >
> >
> >             On Friday, Jun 14, 2019 at 10:16 AM, Lietz Haiko
> >             <[email protected] <mailto:[email protected]>>
> wrote:
> >             Hi Jean,
> >
> >             the answer also depends how complicated the desired SBM is. A
> >             layered model takes longer than an unlayered one.
> >
> >             Modeling a graph with 100k nodes should take very long. But
> I'd
> >             also be interested in a more informed answer...
> >
> >             Haiko
> >
> >
> >
> >
>  ----------------------------------------------------------------------------
> >             *Von:* graph-tool [[email protected]
> >             <mailto:[email protected]>]" im Auftrag von "Jean
> >             Christophe Cazes [[email protected]
> >             <mailto:[email protected]>]
> >             *Gesendet:* Freitag, 14. Juni 2019 09:59
> >             *An:* [email protected] <mailto:[email protected]>
> >             *Betreff:* [graph-tool] [SBM on Dense Graphs]
> >
> >             Hello, I intend to use graph_tool for a big network, +100k
> nodes
> >             and very dense.
> >
> >             The dataset i'm working with at the moment is ~ 40/50 GB csv
> >             containing vertices and edges as transactions.
> >
> >             Is it realistic to try SBM on such graph both
> computationnally
> >             and would this be theoretically useful?
> >
> >             If it isnt computationnally, how big can my subgraph be in
> order
> >             to be feasible?
> >
> >             Note: I will rent a Google Cloud Platform VM to do so.
> >
> >             Thank you
> >
> >             _______________________________________________
> >             graph-tool mailing list
> >             [email protected] <mailto:[email protected]>
> >             https://lists.skewed.de/mailman/listinfo/graph-tool
> >
> >         _______________________________________________
> >         graph-tool mailing list
> >         [email protected] <mailto:[email protected]>
> >         https://lists.skewed.de/mailman/listinfo/graph-tool
> >
> >     _______________________________________________
> >     graph-tool mailing list
> >     [email protected]
> >     https://lists.skewed.de/mailman/listinfo/graph-tool
> >
> >
> > _______________________________________________
> > graph-tool mailing list
> > [email protected]
> > https://lists.skewed.de/mailman/listinfo/graph-tool
> >
>
>
> --
> Tiago de Paula Peixoto <[email protected]>
>
> _______________________________________________
> graph-tool mailing list
> [email protected]
> https://lists.skewed.de/mailman/listinfo/graph-tool
>
_______________________________________________
graph-tool mailing list
[email protected]
https://lists.skewed.de/mailman/listinfo/graph-tool

Reply via email to