On 12.08.2014 08:07, ayates wrote: > Hi, I am having some trouble reproducing the performance of > minimize_nested_blockmodel_dl on large networks as in the publication > "Hierarchical block structures and high-resolution model selection in large > networks"? Namely, for large graphs, even with verbose=True, no output is > generated, but the CPU usage stays at 100% for hours to days. My network > contains 1.5 million nodes and a sorted adjacency list per node such that I > can choose the top K edges per node as a parameter. I have tried sampling > down to 200,000 nodes and K=5, but minimize_nested_blockmodel_dl does not > seem to be proceeding with computation. CPU is being used at 100% while > running , and there is plenty of free memory. > > Are there options that I can use in minimize_nested_blockmodel_dl to improve > performance? Other than sampling and limiting K, are there other strategies > that I can try? I compiled using the parallel computing option and am using > AWS EC2 instances. The largest network that I have been able to get a result > for within 24 compute hours on C3 sized EC2 instances has been 100,000 > nodes, K=5. (undirected, average degree about 8)
There are several options in the algorithm which affect performance. The most important ones are 'epsilon', 'nsweeps', 'nmerge_sweeps' and 'r' (see the documentation for their meanings). The default values should be OK for large networks, except for 'epsilon', which has a default value of '0'. You should try a value of 1e-3, or even 1e-2, depending on the size of your network. I recommend you experiment with the algorithm with small networks first (~10000 nodes, something you can easily try on a laptop) to get a feeling on how the parameters affect the performance and quality of the results. I would also recommend trying minimize_blockmodel_dl(), since it is more verbose than minimize_nested_blockmodel_dl(), and will give you a better idea of how the algorithm is progressing. If you use 'verbose=True' you should be able to see how fast each sweep of the network is performed. I have no experience with AWS EC2, but I was able to obtain results for very large networks with a regular dedicated computer cluster, as I describe in the paper. However, for the networks with many millions of edges, it does take a while, maybe even a day or two. Best, Tiago -- Tiago de Paula Peixoto <[email protected]>
signature.asc
Description: OpenPGP digital signature
_______________________________________________ graph-tool mailing list [email protected] http://lists.skewed.de/mailman/listinfo/graph-tool
