It seems that .compressTipLabel's running times are proportional to N
(number of trees) and to log(n) (n: nb of tips).
R> tr <- rmtree(1e4, 1e4) # takes ~5 minutes
R> system.time(a <- .compressTipLabel(tr))
utilisateur système écoulé
20.904 0.376 21.275
R> print(object.size(tr), unit = "Gb")
8.2 Gb
R> print(object.size(a), unit = "Gb")
3 Gb
If I divide N by 10, it takes 10 times less time:
R> tr <- rmtree(1e3, 1e4)
R> system.time(a <- .compressTipLabel(tr))
utilisateur système écoulé
2.088 0.000 2.088
To be compared with my previous message with N=10000 and n=1000 which
took 20 times less time (~1.2 sec).
I guess reading the tree file (either in Newick or in NEXUS) will be
much longer than any of these.
Best,
Emmanuel
Le 14/12/2016 à 22:44, Yan Wong a écrit :
On 14 Dec 2016, at 20:57, Emmanuel Paradis <emmanuel.para...@ird.fr> wrote:
What is the size of your problem?
Erm, quite large. I am looking at tree comparison metrics for roughly 10,000
trees with perhaps 10,000 tips on each, replicated several times. The newick
files themselves take up gigabyes uncompressed. For this sized problem I’m
likely to implement my own comparison metrics, but I want to trial this out
with a tested library before rolling my own.
Do you use a recent version of ape? This function was improved one or two years
ago.
Yes, 4.0.
But I’m happy for the moment to just leave this stuff running for days on a
server, so it was just a quick suggestion really.
Thanks for the quick reply
Yan
_______________________________________________
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/