Re: [R-sig-phylo] compressTipLabel as an option to read.trees()

2016-12-15 Thread Emmanuel Paradis
It seems that .compressTipLabel's running times are proportional to N 
(number of trees) and to log(n) (n: nb of tips).


R> tr <- rmtree(1e4, 1e4) # takes ~5 minutes
R> system.time(a <- .compressTipLabel(tr))
utilisateur système  écoulé
 20.904   0.376  21.275

R> print(object.size(tr), unit = "Gb")
8.2 Gb
R> print(object.size(a), unit = "Gb")
3 Gb

If I divide N by 10, it takes 10 times less time:

R> tr <- rmtree(1e3, 1e4)
R> system.time(a <- .compressTipLabel(tr))
utilisateur système  écoulé
  2.088   0.000   2.088

To be compared with my previous message with N=1 and n=1000 which 
took 20 times less time (~1.2 sec).


I guess reading the tree file (either in Newick or in NEXUS) will be 
much longer than any of these.


Best,

Emmanuel

Le 14/12/2016 à 22:44, Yan Wong a écrit :


On 14 Dec 2016, at 20:57, Emmanuel Paradis  wrote:


What is the size of your problem?


Erm, quite large. I am looking at tree comparison metrics for roughly 10,000 
trees with perhaps 10,000 tips on each, replicated several times. The newick 
files themselves take up gigabyes uncompressed. For this sized problem I’m 
likely to implement my own comparison metrics, but I want to trial this out 
with a tested library before rolling my own.


Do you use a recent version of ape? This function was improved one or two years 
ago.


Yes, 4.0.

But I’m happy for the moment to just leave this stuff running for days on a 
server, so it was just a quick suggestion really.

Thanks for the quick reply

Yan






___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/


Re: [R-sig-phylo] compressTipLabel as an option to read.trees()

2016-12-14 Thread Klaus Schliep
Hi Yan,
Joseph was right. In read.nexus you need a TRANSLATE block, just a
TAXLABELS is not enough. Then read.nexus returns the compressed object and
is 10x faster to read in (for 1000 trees with 1000 taxa on my machine).
There is also the package rncl (Nexus Class Library), it is faster to read
in, even the pure R implementation with the TRANSLATE block is almost as
fast.
However the objects are actually quite a bit larger. It also stores the
edge matrix as doubles, and which I find dangerous.
Cheers,
Klaus

On Wed, Dec 14, 2016 at 4:44 PM, Yan Wong  wrote:

>
> On 14 Dec 2016, at 20:57, Emmanuel Paradis 
> wrote:
>
> > What is the size of your problem?
>
> Erm, quite large. I am looking at tree comparison metrics for roughly
> 10,000 trees with perhaps 10,000 tips on each, replicated several times.
> The newick files themselves take up gigabyes uncompressed. For this sized
> problem I’m likely to implement my own comparison metrics, but I want to
> trial this out with a tested library before rolling my own.
>
> > Do you use a recent version of ape? This function was improved one or
> two years ago.
>
> Yes, 4.0.
>
> But I’m happy for the moment to just leave this stuff running for days on
> a server, so it was just a quick suggestion really.
>
> Thanks for the quick reply
>
> Yan
> ___
> R-sig-phylo mailing list - R-sig-phylo@r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
> Searchable archive at http://www.mail-archive.com/r-
> sig-ph...@r-project.org/
>



-- 
Klaus Schliep
Postdoctoral Fellow
Revell Lab, University of Massachusetts Boston
http://www.phangorn.org/

[[alternative HTML version deleted]]

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/

Re: [R-sig-phylo] compressTipLabel as an option to read.trees()

2016-12-14 Thread Yan Wong

On 14 Dec 2016, at 21:06, Emmanuel Paradis  wrote:

> If the trees are in a NEXUS file with a TRANSLATE block, then the output is a 
> compressed list. So applying .compressTipLabel returns the list unmodified 
> (which should be almost instantaneous).

Ah, I see what I was doing wrong. I used a BEGIN TAXA;TAXLABELS ... END; block, 
rather than a TRANSLATE block within the TREES block. The read.nexus() function 
now works as Joseph Brown surmised. So the easiest way for me to do this is 
simply to use a nexus format trees file.

Thanks

Yan
___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/


Re: [R-sig-phylo] compressTipLabel as an option to read.trees()

2016-12-14 Thread Emmanuel Paradis
If the trees are in a NEXUS file with a TRANSLATE block, then the output 
is a compressed list. So applying .compressTipLabel returns the list 
unmodified (which should be almost instantaneous).


Best,

Emmanuel

Le 14/12/2016 à 16:51, Yan Wong a écrit :


On 14 Dec 2016, at 15:33, Joseph W. Brown  wrote:


I wonder if reading in a Nexus file with a translation table bypasses this 
problem?


Cheers,

If I try read.nexus with a TAXLABELS entry, it still (oddly) results in a 
multiPhylo structure of the same size as before running .compressTipLabel. 
However, when I then do .compressTipLabel() it only takes a moment. My guess is 
this is something to do with skipping the renumbering process. It would be nice 
to have the option in both read.nexus and read.tree, so that I don’t have to 
allocate memory (many GB in my case) for the intermediate step.

Thanks

Yan
___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/







___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/


Re: [R-sig-phylo] compressTipLabel as an option to read.trees()

2016-12-14 Thread Emmanuel Paradis

Hi Yan,

I tried with 10,000 trees each with 1000 tips and it took a bit more 
than 1 sec:


R> tr <- rmtree(1, 1000)
R> system.time(a <- .compressTipLabel(tr))
utilisateur système  écoulé
  1.124   0.036   1.161

And yes the memory footprint is substantially decreased:

R> print(object.size(tr), unit="Mb")
850.6 Mb
R> print(object.size(a), unit="Mb")
315.7 Mb

What is the size of your problem?

Do you use a recent version of ape? This function was improved one or 
two years ago.


Best,

Emmanuel

Le 14/12/2016 à 16:16, Yan Wong a écrit :

Hi,

I’m reading in a large number of newick trees with the same tips, all from a single 
file. If I do trees<-read.trees() followed by trees <- 
.compressTipLabel(trees), it reduces the memory footprint well, but takes an age to 
run. I can’t help thinking this could be sped up during the reading process by 
passing an option to read.trees() to specify that the tip labels are the same in each 
tree in the multiPhylo object. Has anyone implemented such an option?

Cheers

Yan
___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/


Pour nous remonter une erreur de filtrage, veuillez vous rendre ici : 
http://f.security-mail.net/3014IN50W4c




___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/


Re: [R-sig-phylo] compressTipLabel as an option to read.trees()

2016-12-14 Thread Yan Wong

On 14 Dec 2016, at 15:33, Joseph W. Brown  wrote:

> I wonder if reading in a Nexus file with a translation table bypasses this 
> problem?

Cheers,

If I try read.nexus with a TAXLABELS entry, it still (oddly) results in a 
multiPhylo structure of the same size as before running .compressTipLabel. 
However, when I then do .compressTipLabel() it only takes a moment. My guess is 
this is something to do with skipping the renumbering process. It would be nice 
to have the option in both read.nexus and read.tree, so that I don’t have to 
allocate memory (many GB in my case) for the intermediate step.

Thanks

Yan
___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/


Re: [R-sig-phylo] compressTipLabel as an option to read.trees()

2016-12-14 Thread Joseph W. Brown
I wonder if reading in a Nexus file with a translation table bypasses this 
problem?

JWB

Joseph W. Brown
Post-doctoral Researcher, Smith Laboratory
University of Michigan
Department of Ecology & Evolutionary Biology
Room 2071, Kraus Natural Sciences Building
Ann Arbor MI 48109-1079
josep...@umich.edu



> On 14 Dec, 2016, at 10:16, Yan Wong  wrote:
> 
> Hi,
> 
> I’m reading in a large number of newick trees with the same tips, all from a 
> single file. If I do trees<-read.trees() followed by trees <- 
> .compressTipLabel(trees), it reduces the memory footprint well, but takes an 
> age to run. I can’t help thinking this could be sped up during the reading 
> process by passing an option to read.trees() to specify that the tip labels 
> are the same in each tree in the multiPhylo object. Has anyone implemented 
> such an option?
> 
> Cheers
> 
> Yan
> ___
> R-sig-phylo mailing list - R-sig-phylo@r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
> Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/


[[alternative HTML version deleted]]

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/

[R-sig-phylo] compressTipLabel as an option to read.trees()

2016-12-14 Thread Yan Wong
Hi,

I’m reading in a large number of newick trees with the same tips, all from a 
single file. If I do trees<-read.trees() followed by trees <- 
.compressTipLabel(trees), it reduces the memory footprint well, but takes an 
age to run. I can’t help thinking this could be sped up during the reading 
process by passing an option to read.trees() to specify that the tip labels are 
the same in each tree in the multiPhylo object. Has anyone implemented such an 
option?

Cheers

Yan
___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/