[R-sig-phylo] retrieving simulated ancestral states from sim.char{geiger}
Hi everybody, I have a question regarding the sim.char function in geiger. Does anybody foresee an easy way to retrieve the simulated ancestral states ? (i.e. the values of each trait for all internal nodes of the tree) Does this sound doable ? Any hint about how to do it ? Thanks for your help Seb -- - Sébastien Lavergne Laboratoire d'Ecologie Alpine, UMR-CNRS 5553 Université Joseph Fourier BP 53, 38041 Grenoble Cedex 9, France tel +33 (0)4 76 63 54 50 http://www-leca.ujf-grenoble.fr/membres/lavergne.htm ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Re: [R-sig-phylo] retrieving simulated ancestral states from sim.char{geiger}
I forgot to mention that I am willing to do this with continuous traits. Thanks to those who already answered (about discrete characters unfortunately). Seb Le 04/08/2010 15:03, Sebastien Lavergne a écrit : Hi everybody, I have a question regarding the sim.char function in geiger. Does anybody foresee an easy way to retrieve the simulated ancestral states ? (i.e. the values of each trait for all internal nodes of the tree) Does this sound doable ? Any hint about how to do it ? Thanks for your help Seb -- - Sébastien Lavergne Laboratoire d'Ecologie Alpine, UMR-CNRS 5553 Université Joseph Fourier BP 53, 38041 Grenoble Cedex 9, France tel +33 (0)4 76 63 54 50 http://www-leca.ujf-grenoble.fr/membres/lavergne.htm ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Re: [R-sig-phylo] retrieving simulated ancestral states from sim.char{geiger}
Hi Sebastien, I believe the simulation simply draws random numbers from the appropriate multivariate normal distribution determined by the model's covariance matrix for the tip values. This means that a complete trajectory of trait values in never actually generated. This is much more efficient than simulating the stochastic differential equation step by step along each branch, and achieves the same result. To obtain simulated trait values at the internal nodes, you would again draw from the multivariate normal distribution, but you would want a covariance matrix which was n_nodes by n_nodes, not just n_tips by n_tips. This covariance matrix is relatively straightforward to compute. For instance, in the case of Brownian motion you would first calculate the Brownian rate parameter sigma. The covariance matrix is just sigma^2 times the matrix of divergence times. The matrix of divergence times has entry i,j equal to the length of time from the root until those two nodes diverged (age of their common ancestor measured from the root). (So if they shared no evolutionary history, they have no covariance, and if i=j, the the time is equal to the time since the root, sigma^2 *t, the expected variance). Using this matrix you would just draw from the multivariate normal distribution with this as the covariance matrix. Don't know if code for this exists already (anyone?) but we could write something up to do this if you like. Sounds useful. Hope that made sense, Carl On Wed, Aug 4, 2010 at 6:59 AM, Sebastien Lavergne sebastien.laver...@ujf-grenoble.fr wrote: I forgot to mention that I am willing to do this with continuous traits. Thanks to those who already answered (about discrete characters unfortunately). Seb Le 04/08/2010 15:03, Sebastien Lavergne a écrit : Hi everybody, I have a question regarding the sim.char function in geiger. Does anybody foresee an easy way to retrieve the simulated ancestral states ? (i.e. the values of each trait for all internal nodes of the tree) Does this sound doable ? Any hint about how to do it ? Thanks for your help Seb -- - Sébastien Lavergne Laboratoire d'Ecologie Alpine, UMR-CNRS 5553 Université Joseph Fourier BP 53, 38041 Grenoble Cedex 9, France tel +33 (0)4 76 63 54 50 http://www-leca.ujf-grenoble.fr/membres/lavergne.htm ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo -- Carl Boettiger Population Biology, UC Davis http://two.ucdavis.edu/~cboettig [[alternative HTML version deleted]] ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Re: [R-sig-phylo] retrieving simulated ancestral states from sim.char{geiger}
Luke wrote: The simulations in Geiger do draw from a mvn distribution but - in an unnecessary step - draw random numbers for every branch and use matrix multiplication to get the tip values. This is kind of dumb and needlessly slow but does give you the right answer. Carl is correct that simply drawing from a mvn distribution is faster and equivalent. If I had time I'd rewrite the function - maybe I will! Without knowing exactly what your R code is doing, I just want to add to these (correct) ideas that there is one simple way of getting tip and interior node values without much more effort. Work up the tree, drawing a set of normals (one for each character) to get the net changes in that branch. Keep going until you reach the tips. (If the characters are not independent, you also need to have on hand a matrix which contains the square root of the covariances among characters, and multiply the values at each node by that.) This way you don't need a nodes x nodes covariance matrix. Joe Joe Felsenstein j...@gs.washington.edu Department of Genome Sciences and Department of Biology, University of Washington, Box 355065, Seattle, WA 98195-5065 USA ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Re: [R-sig-phylo] retrieving simulated ancestral states from sim.char{geiger}
Hi all. Here is some code that I think will do what Dr. Felsenstein has suggested for a single character. It returns a vector containing the simulated states at all internal or terminal nodes, numbered according to the numbers used in tree$edge. Happy to hear if this works. See below: # conducts BM simulation by ascending through the tree from the root node to the tips # by Liam J. Revell sim.bm-function(tree,bm.rate,root.state){ # generate random normal deviates dev-rnorm(n=length(tree$edge.length)) # give them appropriate variances dev-dev*sqrt(bm.rate*tree$edge.length) # sort the tree edges, edge.lengths, and devs ranks-order(tree$edge[,1]) tree$edge-tree$edge[ranks,] tree$edge.length-tree$edge.length[ranks] dev-dev[ranks] # add them up state-matrix(0,length(tree$edge.length),1); for(j in 1:length(tree$edge.length)){ if(tree$edge[j,1]==length(tree$tip)+1) state[j]-root.state+dev[j] else state[j]-state[match(tree$edge[j,1],tree$edge[,2])]+dev[j] } # bookkeeping rownames(state)-tree$edge[,2] state-state[order(as.numeric(rownames(state))),] sim.bm-state } Sincerely, Liam Liam J. Revell NESCent, Duke University web: http://anolis.oeb.harvard.edu/~liam/ NEW email: lrev...@nescent.org Joe Felsenstein wrote: Luke wrote: The simulations in Geiger do draw from a mvn distribution but - in an unnecessary step - draw random numbers for every branch and use matrix multiplication to get the tip values. This is kind of dumb and needlessly slow but does give you the right answer. Carl is correct that simply drawing from a mvn distribution is faster and equivalent. If I had time I'd rewrite the function - maybe I will! Without knowing exactly what your R code is doing, I just want to add to these (correct) ideas that there is one simple way of getting tip and interior node values without much more effort. Work up the tree, drawing a set of normals (one for each character) to get the net changes in that branch. Keep going until you reach the tips. (If the characters are not independent, you also need to have on hand a matrix which contains the square root of the covariances among characters, and multiply the values at each node by that.) This way you don't need a nodes x nodes covariance matrix. Joe Joe Felsenstein j...@gs.washington.edu Department of Genome Sciences and Department of Biology, University of Washington, Box 355065, Seattle, WA 98195-5065 USA ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Re: [R-sig-phylo] retrieving simulated ancestral states from sim.char{geiger}
Thank you all for these nice and very very helpful answers... it seems that I have some work to do for a little while now. I'll give a try to all the hints and pieces of code you sent me, and may get back to you off-list for feedbacks or more precise requests. Thanks again Seb Le 04/08/2010 21:36, Liam J. Revell a écrit : Hi all. Here is some code that I think will do what Dr. Felsenstein has suggested for a single character. It returns a vector containing the simulated states at all internal or terminal nodes, numbered according to the numbers used in tree$edge. Happy to hear if this works. See below: # conducts BM simulation by ascending through the tree from the root node to the tips # by Liam J. Revell sim.bm-function(tree,bm.rate,root.state){ # generate random normal deviates dev-rnorm(n=length(tree$edge.length)) # give them appropriate variances dev-dev*sqrt(bm.rate*tree$edge.length) # sort the tree edges, edge.lengths, and devs ranks-order(tree$edge[,1]) tree$edge-tree$edge[ranks,] tree$edge.length-tree$edge.length[ranks] dev-dev[ranks] # add them up state-matrix(0,length(tree$edge.length),1); for(j in 1:length(tree$edge.length)){ if(tree$edge[j,1]==length(tree$tip)+1) state[j]-root.state+dev[j] else state[j]-state[match(tree$edge[j,1],tree$edge[,2])]+dev[j] } # bookkeeping rownames(state)-tree$edge[,2] state-state[order(as.numeric(rownames(state))),] sim.bm-state } Sincerely, Liam Liam J. Revell NESCent, Duke University web: http://anolis.oeb.harvard.edu/~liam/ NEW email: lrev...@nescent.org Joe Felsenstein wrote: Luke wrote: The simulations in Geiger do draw from a mvn distribution but - in an unnecessary step - draw random numbers for every branch and use matrix multiplication to get the tip values. This is kind of dumb and needlessly slow but does give you the right answer. Carl is correct that simply drawing from a mvn distribution is faster and equivalent. If I had time I'd rewrite the function - maybe I will! Without knowing exactly what your R code is doing, I just want to add to these (correct) ideas that there is one simple way of getting tip and interior node values without much more effort. Work up the tree, drawing a set of normals (one for each character) to get the net changes in that branch. Keep going until you reach the tips. (If the characters are not independent, you also need to have on hand a matrix which contains the square root of the covariances among characters, and multiply the values at each node by that.) This way you don't need a nodes x nodes covariance matrix. Joe Joe Felsenstein j...@gs.washington.edu Department of Genome Sciences and Department of Biology, University of Washington, Box 355065, Seattle, WA 98195-5065 USA ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo -- - Sébastien Lavergne Laboratoire d'Ecologie Alpine, UMR-CNRS 5553 Université Joseph Fourier BP 53, 38041 Grenoble Cedex 9, France tel +33 (0)4 76 63 54 50 http://www-leca.ujf-grenoble.fr/membres/lavergne.htm ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo