On 12.08.2014 08:07, ayates wrote:
> Hi, I am having some trouble reproducing the performance of
> minimize_nested_blockmodel_dl on large networks as in the publication
> "Hierarchical block structures and high-resolution model selection in large
> networks"? Namely, for large graphs, even with verbose=True, no output is
> generated, but the CPU usage stays at 100% for hours to days. My network
> contains 1.5 million nodes and a sorted adjacency list per node such that I
> can choose the top K edges per node as a parameter. I have tried sampling
> down to 200,000 nodes and K=5, but minimize_nested_blockmodel_dl does not
> seem to be proceeding with computation. CPU is being used at 100% while
> running , and there is plenty of free memory.
>
> Are there options that I can use in minimize_nested_blockmodel_dl to improve
> performance? Other than sampling and limiting K, are there other strategies
> that I can try? I compiled using the parallel computing option and am using
> AWS EC2 instances. The largest network that I have been able to get a result
> for within 24 compute hours on C3 sized EC2 instances has been 100,000
> nodes, K=5. (undirected, average degree about 8)

There are several options in the algorithm which affect performance. The
most important ones are 'epsilon', 'nsweeps', 'nmerge_sweeps' and 'r'
(see the documentation for their meanings). The default values should be
OK for large networks, except for 'epsilon', which has a default value
of '0'. You should try a value of 1e-3, or even 1e-2, depending on the
size of your network.

I recommend you experiment with the algorithm with small networks first
(~10000 nodes, something you can easily try on a laptop) to get a
feeling on how the parameters affect the performance and quality of the
results.

I would also recommend trying minimize_blockmodel_dl(), since it is more
verbose than minimize_nested_blockmodel_dl(), and will give you a better
idea of how the algorithm is progressing. If you use 'verbose=True' you
should be able to see how fast each sweep of the network is performed.

I have no experience with AWS EC2, but I was able to obtain results for
very large networks with a regular dedicated computer cluster, as I
describe in the paper. However, for the networks with many millions of
edges, it does take a while, maybe even a day or two.

Best,
Tiago

--
Tiago de Paula Peixoto <[email protected]>

Attachment: signature.asc
Description: OpenPGP digital signature

_______________________________________________
graph-tool mailing list
[email protected]
http://lists.skewed.de/mailman/listinfo/graph-tool

Reply via email to