It seems that .compressTipLabel's running times are proportional to N (number of trees) and to log(n) (n: nb of tips).

R> tr <- rmtree(1e4, 1e4) # takes ~5 minutes
R> system.time(a <- .compressTipLabel(tr))
utilisateur     système      écoulé
     20.904       0.376      21.275

R> print(object.size(tr), unit = "Gb")
8.2 Gb
R> print(object.size(a), unit = "Gb")
3 Gb

If I divide N by 10, it takes 10 times less time:

R> tr <- rmtree(1e3, 1e4)
R> system.time(a <- .compressTipLabel(tr))
utilisateur     système      écoulé
      2.088       0.000       2.088

To be compared with my previous message with N=10000 and n=1000 which took 20 times less time (~1.2 sec).

I guess reading the tree file (either in Newick or in NEXUS) will be much longer than any of these.

Best,

Emmanuel

Le 14/12/2016 à 22:44, Yan Wong a écrit :

On 14 Dec 2016, at 20:57, Emmanuel Paradis <emmanuel.para...@ird.fr> wrote:

What is the size of your problem?

Erm, quite large. I am looking at tree comparison metrics for roughly 10,000 
trees with perhaps 10,000 tips on each, replicated several times. The newick 
files themselves take up gigabyes uncompressed. For this sized problem I’m 
likely to implement my own comparison metrics, but I want to trial this out 
with a tested library before rolling my own.

Do you use a recent version of ape? This function was improved one or two years 
ago.

Yes, 4.0.

But I’m happy for the moment to just leave this stuff running for days on a 
server, so it was just a quick suggestion really.

Thanks for the quick reply

Yan





_______________________________________________
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/

Reply via email to