Re: [R-sig-phylo] compressTipLabel as an option to read.trees()
It seems that .compressTipLabel's running times are proportional to N (number of trees) and to log(n) (n: nb of tips). R> tr <- rmtree(1e4, 1e4) # takes ~5 minutes R> system.time(a <- .compressTipLabel(tr)) utilisateur système écoulé 20.904 0.376 21.275 R> print(object.size(tr), unit = "Gb") 8.2 Gb R> print(object.size(a), unit = "Gb") 3 Gb If I divide N by 10, it takes 10 times less time: R> tr <- rmtree(1e3, 1e4) R> system.time(a <- .compressTipLabel(tr)) utilisateur système écoulé 2.088 0.000 2.088 To be compared with my previous message with N=1 and n=1000 which took 20 times less time (~1.2 sec). I guess reading the tree file (either in Newick or in NEXUS) will be much longer than any of these. Best, Emmanuel Le 14/12/2016 à 22:44, Yan Wong a écrit : On 14 Dec 2016, at 20:57, Emmanuel Paradiswrote: What is the size of your problem? Erm, quite large. I am looking at tree comparison metrics for roughly 10,000 trees with perhaps 10,000 tips on each, replicated several times. The newick files themselves take up gigabyes uncompressed. For this sized problem I’m likely to implement my own comparison metrics, but I want to trial this out with a tested library before rolling my own. Do you use a recent version of ape? This function was improved one or two years ago. Yes, 4.0. But I’m happy for the moment to just leave this stuff running for days on a server, so it was just a quick suggestion really. Thanks for the quick reply Yan ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/
Re: [R-sig-phylo] compressTipLabel as an option to read.trees()
Hi Yan, Joseph was right. In read.nexus you need a TRANSLATE block, just a TAXLABELS is not enough. Then read.nexus returns the compressed object and is 10x faster to read in (for 1000 trees with 1000 taxa on my machine). There is also the package rncl (Nexus Class Library), it is faster to read in, even the pure R implementation with the TRANSLATE block is almost as fast. However the objects are actually quite a bit larger. It also stores the edge matrix as doubles, and which I find dangerous. Cheers, Klaus On Wed, Dec 14, 2016 at 4:44 PM, Yan Wongwrote: > > On 14 Dec 2016, at 20:57, Emmanuel Paradis > wrote: > > > What is the size of your problem? > > Erm, quite large. I am looking at tree comparison metrics for roughly > 10,000 trees with perhaps 10,000 tips on each, replicated several times. > The newick files themselves take up gigabyes uncompressed. For this sized > problem I’m likely to implement my own comparison metrics, but I want to > trial this out with a tested library before rolling my own. > > > Do you use a recent version of ape? This function was improved one or > two years ago. > > Yes, 4.0. > > But I’m happy for the moment to just leave this stuff running for days on > a server, so it was just a quick suggestion really. > > Thanks for the quick reply > > Yan > ___ > R-sig-phylo mailing list - R-sig-phylo@r-project.org > https://stat.ethz.ch/mailman/listinfo/r-sig-phylo > Searchable archive at http://www.mail-archive.com/r- > sig-ph...@r-project.org/ > -- Klaus Schliep Postdoctoral Fellow Revell Lab, University of Massachusetts Boston http://www.phangorn.org/ [[alternative HTML version deleted]] ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/
Re: [R-sig-phylo] compressTipLabel as an option to read.trees()
On 14 Dec 2016, at 21:06, Emmanuel Paradiswrote: > If the trees are in a NEXUS file with a TRANSLATE block, then the output is a > compressed list. So applying .compressTipLabel returns the list unmodified > (which should be almost instantaneous). Ah, I see what I was doing wrong. I used a BEGIN TAXA;TAXLABELS ... END; block, rather than a TRANSLATE block within the TREES block. The read.nexus() function now works as Joseph Brown surmised. So the easiest way for me to do this is simply to use a nexus format trees file. Thanks Yan ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/
Re: [R-sig-phylo] compressTipLabel as an option to read.trees()
If the trees are in a NEXUS file with a TRANSLATE block, then the output is a compressed list. So applying .compressTipLabel returns the list unmodified (which should be almost instantaneous). Best, Emmanuel Le 14/12/2016 à 16:51, Yan Wong a écrit : On 14 Dec 2016, at 15:33, Joseph W. Brownwrote: I wonder if reading in a Nexus file with a translation table bypasses this problem? Cheers, If I try read.nexus with a TAXLABELS entry, it still (oddly) results in a multiPhylo structure of the same size as before running .compressTipLabel. However, when I then do .compressTipLabel() it only takes a moment. My guess is this is something to do with skipping the renumbering process. It would be nice to have the option in both read.nexus and read.tree, so that I don’t have to allocate memory (many GB in my case) for the intermediate step. Thanks Yan ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/ ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/
Re: [R-sig-phylo] compressTipLabel as an option to read.trees()
Hi Yan, I tried with 10,000 trees each with 1000 tips and it took a bit more than 1 sec: R> tr <- rmtree(1, 1000) R> system.time(a <- .compressTipLabel(tr)) utilisateur système écoulé 1.124 0.036 1.161 And yes the memory footprint is substantially decreased: R> print(object.size(tr), unit="Mb") 850.6 Mb R> print(object.size(a), unit="Mb") 315.7 Mb What is the size of your problem? Do you use a recent version of ape? This function was improved one or two years ago. Best, Emmanuel Le 14/12/2016 à 16:16, Yan Wong a écrit : Hi, I’m reading in a large number of newick trees with the same tips, all from a single file. If I do trees<-read.trees() followed by trees <- .compressTipLabel(trees), it reduces the memory footprint well, but takes an age to run. I can’t help thinking this could be sped up during the reading process by passing an option to read.trees() to specify that the tip labels are the same in each tree in the multiPhylo object. Has anyone implemented such an option? Cheers Yan ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/ Pour nous remonter une erreur de filtrage, veuillez vous rendre ici : http://f.security-mail.net/3014IN50W4c ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/
Re: [R-sig-phylo] compressTipLabel as an option to read.trees()
On 14 Dec 2016, at 15:33, Joseph W. Brownwrote: > I wonder if reading in a Nexus file with a translation table bypasses this > problem? Cheers, If I try read.nexus with a TAXLABELS entry, it still (oddly) results in a multiPhylo structure of the same size as before running .compressTipLabel. However, when I then do .compressTipLabel() it only takes a moment. My guess is this is something to do with skipping the renumbering process. It would be nice to have the option in both read.nexus and read.tree, so that I don’t have to allocate memory (many GB in my case) for the intermediate step. Thanks Yan ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/
Re: [R-sig-phylo] compressTipLabel as an option to read.trees()
I wonder if reading in a Nexus file with a translation table bypasses this problem? JWB Joseph W. Brown Post-doctoral Researcher, Smith Laboratory University of Michigan Department of Ecology & Evolutionary Biology Room 2071, Kraus Natural Sciences Building Ann Arbor MI 48109-1079 josep...@umich.edu > On 14 Dec, 2016, at 10:16, Yan Wongwrote: > > Hi, > > I’m reading in a large number of newick trees with the same tips, all from a > single file. If I do trees<-read.trees() followed by trees <- > .compressTipLabel(trees), it reduces the memory footprint well, but takes an > age to run. I can’t help thinking this could be sped up during the reading > process by passing an option to read.trees() to specify that the tip labels > are the same in each tree in the multiPhylo object. Has anyone implemented > such an option? > > Cheers > > Yan > ___ > R-sig-phylo mailing list - R-sig-phylo@r-project.org > https://stat.ethz.ch/mailman/listinfo/r-sig-phylo > Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/ [[alternative HTML version deleted]] ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/
[R-sig-phylo] compressTipLabel as an option to read.trees()
Hi, I’m reading in a large number of newick trees with the same tips, all from a single file. If I do trees<-read.trees() followed by trees <- .compressTipLabel(trees), it reduces the memory footprint well, but takes an age to run. I can’t help thinking this could be sped up during the reading process by passing an option to read.trees() to specify that the tip labels are the same in each tree in the multiPhylo object. Has anyone implemented such an option? Cheers Yan ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/