Hi Dave.

It seems like one way to get these values faster would be to count the number of descendants as the tree is read in to R. This is possible because when the ")" character is reached by the Newick parser, all descendant nodes (and tips) have already been created in memory.

This was simple to add to my tree reading function read.newick() [code here: http://anolis.oeb.harvard.edu/~liam/R-phylogenetics/read.newick/v0.3/read.newick.R]. Now v0.3 of the function returns a modified "phylo" object with the vector $Ndesc containing the number of descendants from each internal and terminal node above the root (in the order of $edge[,2]). This is exactly the same vector as would be obtained by the following commands:

> desc<-sapply(tree$edge[,2],function(x) node.leaves(tree,x))
> Ndesc<-sapply(desc,function(x) length(x))

Note that read.newick() only reads one tree at a time in "phylip" format (and without node labels) - but reading multiple trees or node labels, or reading from "nexus" format, would quite easy to add if you find that this is indeed faster.

In addition, I believe that newick2phylog() creates this list in reading a Newick tree into memory as a "phylog" object. This is stored as phy$Aparam$nlea for "phylog" object phy. Unfortunately, it seems like newick2phylog() is very slow for large trees.

- Liam

--
Liam J. Revell
University of Massachusetts Boston
web: http://faculty.umb.edu/liam.revell/
email: liam.rev...@umb.edu
blog: http://phytools.blogspot.com


On 3/6/2011 12:47 PM, David Bapst wrote:
Hello all,
I'm currently trying to measure a parameter over a large number of
large trees (>1700 tips), and part of this calculation requires
knowing the tip taxa descended from each node (I must compare the
difference in tip values among taxa descended from a node). Because I
must do this many times,  I decided to compare the efficiency of
several methods for doing this in various R libraries with
system.time() (as Liam did recently with some BM simulation functions
in some recent blog posts). As I feel that others may benefit from
this comparison of methods, I am posting the results to this list.

Geiger's node.leaves() is much faster than the alternatives, although
at ~13s a run, it is still not a particularly speedy process. I didn't
need the actual tip.labels, so I took node.leaves() and made it as
lean as possible, to produce node.tips(), below. That cut the run time
down to ~9 sec.

node.tips<-function (phy, node){
        n<- length(phy$tip.label)
        if (node<= n){node}else{
                l<- numeric()
                d<- phy$edge[which(phy$edge[,1]==node),2]
                for(j in d){if(j<= n){l<- c(l, j)}else{l<-c(l, 
node.tips(phy,j))}}
        l}}

If anyone knows of another alternative that might be faster, I would
greatly appreciate any suggestions.
-Dave Bapst, UChicago Geosci

Using the modified node.leaves() function from geiger, node.tips()

system.time(desc<-sapply(edge_end,function(x) node.tips(res_tree,x)))
    user  system elapsed
    9.39    0.00    9.44

Using node.leaves() in geiger

system.time(desc<-sapply(edge_end,function(x) node.leaves(res_tree,x)))
    user  system elapsed
   13.29    0.02   13.34

Using Descendants() in phangorn

system.time(desc<-sapply(edge_end,function(x) Descendants(res_tree,x)))
    user  system elapsed
   75.60    0.10   76.83

Using listTips() in adephylo

system.time(desc_list<-c(as.list(1:Ntip(res_tree)),listTips(res_tree)))
    user  system elapsed
   75.27    2.93   87.28

Using descendants() in phylobase

system.time(desc<-sapply(edge_end,function(x) 
descendants(res_tree4,x,type="tips")))
    user  system elapsed
  149.56    0.67  155.15

Using nodeDecendants() in maticce  (note that translating the tree
into ouchtree format was prohibitively very lengthy)

system.time(res_tree_ou<-ape2ouch(res_tree))
    user  system elapsed
   84.01    0.13   85.34
system.time(desc<-sapply(edge_end,function(x) nodeDescendents(res_tree_ou,x)))
    user  system elapsed
   65.91    0.23   68.47


_______________________________________________
R-sig-phylo mailing list
R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo

Reply via email to