Re: [R-sig-phylo] size-free morphometrics in phylogenetic framework
Hi Dan, Even though this is well-trodden ground in some respects (by the likes of TG and others), I have often-enough fielded questions about size-correction and principal components analysis on species data (or encountered those who see nothing wrong with ignoring phylogeny in said procedures) that I recently wrote and submitted a Brief Communication to 'Evolution' on the topic entitled Statistical transformations and trees: Phylogenetic size-correction and principal components. In the article, I provide the mathematical details of these procedures, as well as simple R and Matlab code in an appendix. I also analyzed the variance and type I error associated with ignoring phylogeny in these preliminary corrections or data reduction procedures. Although the effect is not incredibly severe, it is large enough to be of significant concern. I'm happy to send this manuscript to anyone who is interested. As I said, it is presently in review, but I'm encouraged by this discussion thread that there seems to be considerable interest in this area. I also performed my type I error analyses using stochastic pure-birth trees. Obviously, the effect on type I error will be increased for more realistic birth-death trees (in which there are more shorter branches towards to tips of the tree). Thanks! - Liam Liam J. Revell Department of Organismic and Evolutionary Biology Harvard University web: http://anolis.oeb.harvard.edu/~liam/ email: lrev...@fas.harvard.edu On Mon, 6 Apr 2009, Luke Harmon wrote: You should reply to the list. Sent from my iPod Begin forwarded message: From: Brian O'Meara bcome...@nescent.org Date: April 6, 2009 9:04:00 AM PDT (CA) To: Dan Rabosky dl...@cornell.edu Cc: r-sig-phylo@r-project.org Subject: Re: [R-sig-phylo] size-free morphometrics in phylogenetic framework Hi, Dan et al. I've been worried about this for PCA. Mesquite has an evolutionary PCA approach, but as far as I know, this hasn't been published anywhere. Assuming the method implemented in Mesquite is done properly (which I think rather likely), it'd be relatively simple to compare the effect of the evolutionary PCA vs conventional approach on real and simulated datasets. Brian On Apr 6, 2009, at 11:53 AM, Dan Rabosky wrote: Howdy folks- I think I'm opening this up for discussion, rather than pointing to a specific problem. Suppose I'm conducting an analysis of phenotypic evolution and need size-free species measurements. A standard approach might be to take species mean values, then regress those values on some index of size (e.g., snout-vent length), and treat the residuals as size-free measurements for further analysis. However, I've also seen explicit incorporation of phylogenetic info into the estimation of size-free variables. For example, Collar et al (Evolution, online early, DOI: 10./j.1558-5646.2009.00626.x) used PIC to estimate regression slopes of traits on size, then forced these slopes on regressions of raw species trait values and used the residuals as size-free variables. Do we have strong feelings about why/whether the latter is the approach that should be used? My intuition, probably incorrect, is that the slopes should be relatively robust regardless of whether phylogeny is used, but not the degrees of freedom. And if this is a serious problem, wouldn't PCA and geometric morphometric approaches also be affected by the assumption that species values represent independent data points? ~Dan Cornell University http://www.eeb.cornell.edu/Rabosky/dan/main.html [[alternative HTML version deleted]] ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Re: [R-sig-phylo] Trait simulations
Hi Jeremy, Perhaps you've already figured this out. There may be a function available in R to do this, but if not, in principle for BM it should be fairly easy to do this manually. First, read your tree in: tree-read.tree(FILE, etc.) % put filename or paste in tree then compute the phylogenetic VCV matrix (let's call it T): T-vcv.phylo(tree) then specify your desired correlation matrix, R. If you wanted an evolutionary correlation of 0.95 this would be: R-matrix(c(1,0.95,0.95,1),nrow=2,ncol=2) then generate a matrix of random, uncorrelated values (U), e.g.: U-matrix(,nrow=N,ncol=2) % where N is the number of taxa U[,1]-rnorm(N) % (am I using this function correctly?) U[,2]-rnorm(N) now give them a correlation based on R (using the Cholesky decomposite): V-U%*%chol(R) now give the data correlation due to the tree: X-t(V)%*%chol(T) I think that should be it. The data in X should be as data evolved by BM on your tree with the correlation specified in your matrix, R (0.95 in this case). More experience users, please let me know if this is incorrect (it is untested, so it may include errors, but I think the logic is sound). Sincerely, Liam Liam J. Revell Department of Organismic and Evolutionary Biology Harvard University web: http://anolis.oeb.harvard.edu/~liam/ email: lrev...@fas.harvard.edu On Mon, 18 May 2009, jeremy.beaul...@yale.edu wrote: Hi all~ I was just wondering if there is a package or function in R that can simulate two continuous traits with a user-specified correlation coefficient using a known tree topology, branch lengths, and a model of Brownian motion (or even OU, if possible)? All the best, Jeremy Beaulieu ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Re: [R-sig-phylo] Standardising independent contrasts
The contrasts returned by pic() should already be standardized. Standardizing phylogenetic contrasts is performed by dividing each contrast (calculated between internal or terminal nodes) by a sum proportional to the square root of its expected variance. Assuming Brownian motion, for two sister terminal nodes, this is just the square root of the sum of their branch lengths to a common ancestor. (This is because the variance of a difference between two random variables is just the sum of their separate variances, and variance is accumulates in direct proportion to elapsed time under BM.) For contrasts that are deeper in the phylogeny, this sum has to be adjusted to account for uncertainty in the reconstructed values at internal nodes, which adds extra variance to the computed difference. This is described in great detail in Felsenstein (1985; Am Nat 125:1-15). However, as noted above - the values returned by pic() should have already been standardized in this way! You should also not subtract the mean value of the contrasts. This is stated explicitly in Garland et al. (1992; Syst Biol 41:18-32): The expected mean of any set of contrasts is zero because the direction of subtraction is arbitrary..., so only standard deviations are needed. - Liam Liam J. Revell NESCent, Duke University web: http://anolis.oeb.harvard.edu/~liam/ NEW email: lrev...@nescent.org On Tue, 15 Sep 2009, Manabu Sakamoto wrote: Dear list, I am trying to standardise independent contrasts following Garland et al. (1992), but I am unsure just how you would extract the sum of it's [contrast's] branch lenghts in R. I computed independent contrasts using the pic function in ape. I then took the contrasts and subtracted the mean from each one and now I need to divide them by their standard deviations or the square root of the sum of their branch lengths (Garland et al., 1992). This is where I am stuck. I assume the sum of the branch lengths here refer to the sum of the two branch lengths that go into the computation of each contrast? I would greatly appreciate any kind support on this matter. Many thanks in advance, Manabu Sakamoto -- -- M. Sakamoto, PhD Department of Earth Sciences, University of Bristol, Wills Memorial Building, Queen's Road, Bristol BS8 1RJ, UK m.sakam...@bristol.ac.uk ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Re: [R-sig-phylo] Standardising independent contrasts
Hi Manabu, Yes, positivized is the same as taking the absolute value. For some of Garland et al.'s diagnostic tests on contrasts you might need the expected variances of the contrasts (actually, proportional to their expected variance prior to standardization), or the unstandardized contrasts. It is possible to obtain these using specific settings in the pic() function. For example, to obtain the expected variances you just run pic(var.contrasts=TRUE), e.g., cont.X-pic(x,tree,var.contrasts=TRUE) where x is a vector of your phenotypic trait and tree is your phylogeny. cont.X is a matrix with the standardized contrasts in the first column and variances in the second column. Similarly: cont.x-pic(x,tree,scaled=FALSE) will return a vector (cont.x) containing the unstandardized contrasts (although in most cases you don't want these). If you compute: cont.X-pic(x,tree,scaled=FALSE,var.contrasts=TRUE) cont.x-cont.X[,1]/sqrt(cont.X[,2]) you will get the same vector of contrasts as you would obtain with: cont.x-pic(x,tree) - Liam Liam J. Revell NESCent, Duke University web: http://anolis.oeb.harvard.edu/~liam/ NEW email: lrev...@nescent.org Manabu Sakamoto wrote: Hi Liam, Thanks very much. Now, that makes sense that pic() contrasts are already standardised, because I was getting strange results compared to Mesquite when I tried to further standardise the contrasts... As I am at it, I thought perhaps you could help me interpret what Garland et al. (1992) referred to when they discuss positivizing stardised contrasts. I thought perhaps it was just simply taking the absolute values of the contrasts because the signs are arbitrary but I was not sure because of the specific wording in that paper... Many thanks, Manabu -- M. Sakamoto, PhD Department of Earth Sciences, University of Bristol, Wills Memorial Building, Queen's Road, Bristol BS8 1RJ, UK m.sakam...@bristol.ac.uk Liam J. Revell さんは書きました: The contrasts returned by pic() should already be standardized. Standardizing phylogenetic contrasts is performed by dividing each contrast (calculated between internal or terminal nodes) by a sum proportional to the square root of its expected variance. Assuming Brownian motion, for two sister terminal nodes, this is just the square root of the sum of their branch lengths to a common ancestor. (This is because the variance of a difference between two random variables is just the sum of their separate variances, and variance is accumulates in direct proportion to elapsed time under BM.) For contrasts that are deeper in the phylogeny, this sum has to be adjusted to account for uncertainty in the reconstructed values at internal nodes, which adds extra variance to the computed difference. This is described in great detail in Felsenstein (1985; Am Nat 125:1-15). However, as noted above - the values returned by pic() should have already been standardized in this way! You should also not subtract the mean value of the contrasts. This is stated explicitly in Garland et al. (1992; Syst Biol 41:18-32): The expected mean of any set of contrasts is zero because the direction of subtraction is arbitrary..., so only standard deviations are needed. - Liam Liam J. Revell NESCent, Duke University web: http://anolis.oeb.harvard.edu/~liam/ NEW email: lrev...@nescent.org On Tue, 15 Sep 2009, Manabu Sakamoto wrote: Dear list, I am trying to standardise independent contrasts following Garland et al. (1992), but I am unsure just how you would extract the sum of it's [contrast's] branch lenghts in R. I computed independent contrasts using the pic function in ape. I then took the contrasts and subtracted the mean from each one and now I need to divide them by their standard deviations or the square root of the sum of their branch lengths (Garland et al., 1992). This is where I am stuck. I assume the sum of the branch lengths here refer to the sum of the two branch lengths that go into the computation of each contrast? I would greatly appreciate any kind support on this matter. Many thanks in advance, Manabu Sakamoto -- -- M. Sakamoto, PhD Department of Earth Sciences, University of Bristol, Wills Memorial Building, Queen's Road, Bristol BS8 1RJ, UK m.sakam...@bristol.ac.uk ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Re: [R-sig-phylo] Standardising independent contrasts
Ted Garland made the very good point to me that positivization is not the same as taking the absolute value of contrasts when several traits are being considered. If, for example, one is inclined to positivize the contrasts for two traits with respect to trait x, then a pair of contrasts (x,y) = (-1,2) will become (1,-2) - not (1,2). The same effect as positivization can be achieved by rotating the descendant branches of a node so that all contrasts in a given trait are positive (obviously, in this case some or many contrasts for another trait can be negative). To my knowledge, however, diagnostic tests on contrasts usually use absolute values (or squares) of contrasts. In general, these should not be used in statistical analyses such as correlation (although positivized contrasts may be used with no ill effect). - Liam Liam J. Revell NESCent, Duke University web: http://anolis.oeb.harvard.edu/~liam/ NEW email: lrev...@nescent.org On Wed, 16 Sep 2009, Liam J. Revell wrote: Hi Manabu, Yes, positivized is the same as taking the absolute value. For some of Garland et al.'s diagnostic tests on contrasts you might need the expected variances of the contrasts (actually, proportional to their expected variance prior to standardization), or the unstandardized contrasts. It is possible to obtain these using specific settings in the pic() function. For example, to obtain the expected variances you just run pic(var.contrasts=TRUE), e.g., cont.X-pic(x,tree,var.contrasts=TRUE) where x is a vector of your phenotypic trait and tree is your phylogeny. cont.X is a matrix with the standardized contrasts in the first column and variances in the second column. Similarly: cont.x-pic(x,tree,scaled=FALSE) will return a vector (cont.x) containing the unstandardized contrasts (although in most cases you don't want these). If you compute: cont.X-pic(x,tree,scaled=FALSE,var.contrasts=TRUE) cont.x-cont.X[,1]/sqrt(cont.X[,2]) you will get the same vector of contrasts as you would obtain with: cont.x-pic(x,tree) - Liam Liam J. Revell NESCent, Duke University web: http://anolis.oeb.harvard.edu/~liam/ NEW email: lrev...@nescent.org Manabu Sakamoto wrote: Hi Liam, Thanks very much. Now, that makes sense that pic() contrasts are already standardised, because I was getting strange results compared to Mesquite when I tried to further standardise the contrasts... As I am at it, I thought perhaps you could help me interpret what Garland et al. (1992) referred to when they discuss positivizing stardised contrasts. I thought perhaps it was just simply taking the absolute values of the contrasts because the signs are arbitrary but I was not sure because of the specific wording in that paper... Many thanks, Manabu -- M. Sakamoto, PhD Department of Earth Sciences, University of Bristol, Wills Memorial Building, Queen's Road, Bristol BS8 1RJ, UK m.sakam...@bristol.ac.uk Liam J. Revell さんは書きました: The contrasts returned by pic() should already be standardized. Standardizing phylogenetic contrasts is performed by dividing each contrast (calculated between internal or terminal nodes) by a sum proportional to the square root of its expected variance. Assuming Brownian motion, for two sister terminal nodes, this is just the square root of the sum of their branch lengths to a common ancestor. (This is because the variance of a difference between two random variables is just the sum of their separate variances, and variance is accumulates in direct proportion to elapsed time under BM.) For contrasts that are deeper in the phylogeny, this sum has to be adjusted to account for uncertainty in the reconstructed values at internal nodes, which adds extra variance to the computed difference. This is described in great detail in Felsenstein (1985; Am Nat 125:1-15). However, as noted above - the values returned by pic() should have already been standardized in this way! You should also not subtract the mean value of the contrasts. This is stated explicitly in Garland et al. (1992; Syst Biol 41:18-32): The expected mean of any set of contrasts is zero because the direction of subtraction is arbitrary..., so only standard deviations are needed. - Liam Liam J. Revell NESCent, Duke University web: http://anolis.oeb.harvard.edu/~liam/ NEW email: lrev...@nescent.org On Tue, 15 Sep 2009, Manabu Sakamoto wrote: Dear list, I am trying to standardise independent contrasts following Garland et al. (1992), but I am unsure just how you would extract the sum of it's [contrast's] branch lenghts in R. I computed independent contrasts using the pic function in ape. I then took the contrasts and subtracted the mean from each one and now I need to divide them by their standard deviations or the square root of the sum of their branch lengths (Garland et al., 1992). This is where I am stuck. I assume the sum of the branch lengths here refer to the sum
Re: [R-sig-phylo] inverting C matrices from Yule trees
Hi Santiago, The problem is that this function, birth.death(,,taxa.stop=X), simulates speciation and extinction under a Yule process until it gets to taxa=X and then stops, without adding any length to the branches leading to the last two taxa. This means that the distances from the root of the tree to taxa 9 10 and their common ancestor are all equal and the matrix returned by vcv.phylo() is exactly singular. A solution is to simulate a tree with taxa=X+1 and then drop the last tip using the function drop.tip(). In your example this would be: solve(vcv.phylo(drop.tip(birthdeath.tree(1,0, taxa.stop=11),11))); This seems to work. - Liam Liam J. Revell NESCent, Duke University web: http://anolis.oeb.harvard.edu/~liam/ NEW email: lrev...@nescent.org Santiago Claramunt wrote: Hi all, I'm having problems computing the inverse of the expected variance-covariance evolutionary matrix (C) on simulated trees generated with birthdeath.tree(). solve(vcv.phylo(birthdeath.tree(1,0, taxa.stop=10))) These matrices seem to be singular! Matrices generated using rtree() and rcoal() are fine. Any insights on this? Cheers, Santiago Santiago Claramunt Museum of Natural Science, 119 Foster Hall, Louisiana State University, Baton Rouge, LA70803 scla...@tigers.lsu.edu http://www.museum.lsu.edu/Claramunt/Home.html ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Re: [R-sig-phylo] correlation between discrete and continuous variables?
Hi Christoph, Stacey, This might be a problem for which a logistic regression (http://en.wikipedia.org/wiki/Logistic_regression) would be appropriate. If so, I encourage you to check out the following article, which I enjoyed: Ives, A. R., and T. Garland Jr. 2009. Phylogenetic logistic regression for binary dependent variables. Systematic Biology (Advance Access). URL: http://sysbio.oxfordjournals.org/cgi/content/abstract/syp074v1. Others on this list-serve might also be well suited to comment on this paper method. Sincerely, Liam Liam J. Revell NESCent, Duke University web: http://anolis.oeb.harvard.edu/~liam/ NEW email: lrev...@nescent.org sd...@duke.edu wrote: Hi Christoph, There was some discussion following a similar question a year ago: The question: https://stat.ethz.ch/pipermail/r-sig-phylo/attachments/20080402/2e2b995e/attachment.pl The threads: https://stat.ethz.ch/pipermail/r-sig-phylo/2008-April/thread.html#35 I'll be interested to hear what approach you take, so let us (or me anyway) know. Stacey Quoting Christoph Heibl christoph.he...@gmx.net: A question to the comparative method gurus: I have a phylogeny for a group of plants that repeatedly evolved annuality from perennial ancestors. For each tip in the tree I have characterized the climatic tolerances (e.g. temperature, precipitation, ...) . I would like to assess if there is a significant correlation between life cycle (discrete) and climate variables (continuous). I am not sure how to start with this and would appreciate your opinion! Regards, Christoph ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Re: [R-sig-phylo] Estimate correlation coefficient of a linear GLS model
Thanks. That is very helpful information, actually! - Liam Liam J. Revell NESCent, Duke University web: http://anolis.oeb.harvard.edu NEW email: lrev...@nescent.org -Original Message- From: Emmanuel Paradis emmanuel.para...@mpl.ird.fr Sent: Monday, January 04, 2010 7:51 AM To: Liam J. Revell lrev...@nescent.org Cc: Christian Arnold chrarn...@web.de; r-sig-phylo@r-project.org Subject: Re: [R-sig-phylo] Estimate correlation coefficient of a linear GLS model Liam J. Revell wrote on 28/12/2009 04:06: Hi Christian, There was a previous discussion on the coefficient of determination (R^2) for PGLS. It can be found in the Oct-2009 R-sig-phylo archive, here: https://stat.ethz.ch/pipermail/r-sig-phylo/2009-October/thread.html ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Re: [R-sig-phylo] Transforming a data.frame into a phylogenetic tree
Hi Timothee, Assuming that I figured out your tree format correctly, I have written you a function to translate your tree-as-data-frame into an object of class phylo in R. I attach the disclaimer that this is programmed very inelegantly and may contain errors. It also contains no method for checking your tree file for errors, thus will probably crash if your tree doesn't make any sense (say, for example, contains floating branches not connected to any other nodes). It assumes that you have installed APE and that, for example (taking your table, below), node is the ancestor to node 1000 and 0001 (as well as several other nodes) with the branch length of 27 leading to node 1000 and 83 leading to 0001. If this is the correct interpretation of your format, the original data file, below, yields a Newick tree of ((5:78,9:140):27,11:185,12:196,10:176,7:159); where the taxon labels are determined by the row labels of the data frame. (Note that this tree is highly polytomous at the root, because a number of nodes in the data frame, e.g., node 1001, leave one and only one descendant.) The input is just the table given below, with the lines above Anc OffSpr. . . etc. excluded, which you can read in as follows: tree.data-read.table(filename,colClasses=character) Then you can load the function from source, as follows: source(tree.read.R) # this name is just to distinguish it from read.tree(), an ape function. And run it as follows: tree-tree.read(tree.data) Trees can be printed to file using, for example: write.tree(tree,filename) The function is as follows: tree.read-function(tree.data){ n.tips=0; k=1; tip.label-NA; edge-matrix(0,nrow(tree.data)-1,2); edge.length-NA; for (i in 1:nrow(tree.data)){ external=1; for (j in 1:nrow(tree.data)){ if(tree.data[i,2]==tree.data[j,1]){ external=0; } } n.tips-n.tips+external; if(external==1){ edge[i-1,2]-n.tips; tip.label[k]-row.names(tree.data[i,]); k=k+1; } } node.number-n.tips; for (i in 2:nrow(tree.data)){ for(j in 1:2){ if(edge[i-1,j]==0){ internal-tree.data[i,j]; node.number-node.number+1; for(k in i:nrow(tree.data)){ for(l in 1:2){ if(tree.data[k,l]==internal){ edge[k-1,l]=node.number; } } } } # find branch length if(j==2){ anc.height=0; for(k in 1:nrow(tree.data)){ if(tree.data[i,1]==tree.data[k,2]){ anc.height-as.numeric(tree.data[k,3]); } } edge.length[i-1]-as.numeric(tree.data[i,3])-anc.height; } } } Nnode-node.number-n.tips; obj-list(edge=edge,tip.label=tip.label,edge.length=edge.length,Nnode=Nnode); class(obj) - phylo; obj-collapse.singles(obj); obj; } I hope this works! - Liam Liam J. Revell NESCent, Duke University web: http://anolis.oeb.harvard.edu/~liam/ NEW email: lrev...@nescent.org Timothee POISOT wrote: Dear users, I am currently doing a model of community assembly, and I would like to integrate phylogenetic information. For each organism, I know its ancestor and the time at which speciation occured. These informations are stored in a data.frame, an example of which is here : PhyloTree Anc OffSpr TimeApp 1 Anc 1 2 1000 28 3 0001 84 4 0010 101 5 1000 1010 106 6 0010 139 7 0100 160 8 0001 1001 161 9 1000 1100 168 10 0010 0011 177 11 1001 10010001 186 12 0010 0110 197 Do you have any idea about how I can proceed to transform this object into a tree object, such as the one used by APE? Regards, Timothée ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Re: [R-sig-phylo] transition matrix in geiger (sim.char)
Hi Alex, The columns in your matrix should sum to 0.0, so if you want the transitions to any of the four states to be equiprobable, try: q-list(rbind(c(-1, 1/3, 1/3, 1/3), c(1/3, -1, 1/3, 1/3), c(1/3, 1/3, -1, 1/3), c(1/3, 1/3, 1/3, -1))); Then type: results-sim.char(tree,q,nsims=1,model=discrete,root.state=1); Note that you don't have to set the root.state, but if not assigned it will assume a root.state of 1. If you want to simulate many characters at once under the same model, you can do that in one of two ways, depending on how you want the output to be stored. You can increase nsims, e.g.: results-sim.char(tree,q,nsims=10,model=discrete,root.state=1); for 10 simulations. Or, you can add additional matrices to the list of transition matrices, q, e.g.: for (i in 2:10){ q[[i]]=q[[1]]; } The latter gives your results in a more convenient form, from which: results-as.matrix(results[,,1]); puts all the results from all 10 simulations into one numspecies x 10 matrix. Good luck! - Liam Liam J. Revell NESCent, Duke University web: http://anolis.oeb.harvard.edu/~liam/ NEW email: lrev...@nescent.org Alexandre Antonelli wrote: Hi, I am trying to use the function sim.char in geiger without success. I would like to let a character (in this case 'species distribution', with four states) evolve along a tree, and try different transition costs in the matrix. However, I don't manage to get more than one character with 2 states. Any ideas what I am doing wrong? It is very unclear to me how the transition matrix should be compiled (if columns/rows have to sum up to one, etc). The manual indicated that the number of states per character would be dependent on the size of the matrix, but that does not seem to be the case. These are the commands I used, and the transition matrix I intended to apply: ## 0: Area A, 1:Area B, 2:Area C, 3:Area D ## ## From0 1 2 3 ## To 0 -.5 .5 .5 .5 ## 1 .5 -.5 .5 .5 ## 2 .5 .5 -.5 .5 ## 3 .5 .5 .5 -.5 require (geiger) tree - read.nexus(file=input.tre) q-list (rbind( c(-.5, .5, .5, .5), c(.5, -.5, .5, .5), c(.5, .5, -.5, .5), c(.5, .5, .5, -.5))) results-sim.char(tree, q, model=discrete,n=2) -- Thanks for any help! Best wishes, Alex Dr Alexandre Antonelli Institute of Systematic Botany Zollikerstrasse 107 CH 8008 Zürich Switzerland Phone: +41 (0)44 634 8416 Mobile: + 41 (0) 76 7345116 alexandre.antone...@systbot.uzh.ch http://www.systbot.uzh.ch/Personen/PostdoktorandInnen/AlexandreAntonelli.html ** [[alternative HTML version deleted]] ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Re: [R-sig-phylo] selecting terminal branch lengths?
Try this: # given an object of class phylo called tree # 1. create vector for terminal edge lengths terminal.edges-matrix(NA,length(tree$tip.label)); # 2. copy terminal branches into vector terminal.edges-tree$edge.length[tree$edge[1:(2*length(tree$tip.label)-2),2]=length(tree$tip.label)]; # 3. label terminal branches by tip label names(terminal.edges)-tree$tip.label; I think that should work. - Liam Liam J. Revell NESCent, Duke University web: http://anolis.oeb.harvard.edu/~liam/ NEW email: lrev...@nescent.org mgavil2 wrote: Dear All I am trying to find a way to get a vector of only the terminal branch length for species in a phylogeny, but cant seem to find a way any suggestions? Best, Maria Mercedes ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Re: [R-sig-phylo] estimate likelihood of data for given model and parameters
Hi Carl (also Brian), This can effectively be accomplished in the geiger function fitContinuous() by setting the upper lower bounds for the parameter to both be equal to the value for which you want to calculate the likelihood. Unfortunately, these are not allowed to be exactly the same; however, they can be arbitrarily close. So, for example, given tree=tree, data=x, and a desired BM rate parameter of 10, you can compute: results-fitContinuous(tree,x,model=BM,bounds=list(beta=c(10.,10.0001))); # for example You can also just calculate the log-likelihood directly as follows: sig2-desired.value; # set your desired value for the BM rate, say sig2-10; as before. C-vcv.phylo(tree); C-C[names(x),names(x)]; # compute VCV for tree, and assure that the rows columns are in the same order as x n-nrow(C); ones-matrix(1,n,1); # compute some preliminaries a-solve(t(ones)%*%solve(sig2*C)%*%ones)%*%t(ones)%*%solve(sig2*C)%*%x; # estimate ancestor logL--(1/2)*(t(x-ones%*%a)%*%solve(sig2*C)%*%(x-ones%*%a))-(n/2)*log(2*pi)-(1/2)*log(det(sig2g*C)); # compute the log-likelihood Hope this is helpful! - Liam Liam J. Revell NESCent, Duke University web: http://anolis.oeb.harvard.edu/~liam/ NEW email: lrev...@nescent.org Carl Boettiger wrote: Dear r-sig-phylo, I'd like to be able to estimate the likelihood of a model with a given Brownian rate parameter under a given data set. For OU models, this can be accomplished through the ouch package with the flag fit=FALSE. However there does not seem to be a corresponding option for Brownian motion in ouch/geiger/ape algorithms for fitting a Brownian motion model? Perhaps I've overlooked something? -Carl ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Re: [R-sig-phylo] Phylogenetic ANOVA
Just to clarify the point - the null distribution for the test-statistic (F) in this method is generated by Brownian motion simulation on the phylogeny. The P-value of the ANOVA is thus obtained by comparing the observed test-statistic to simulated test-statistics (for an arbitrarily large number of simulations). The reference to this is: Garland, T. Jr., A. W. Dickerman, C. M. Janis, J. A. Jones. 1993. Phylogenetic analysis of covariance by computer simulation. Syst. Biol. 42: 265-292. (http://www.jstor.org/stable/2992464) Thus, if you have a test-statistic (F) more extreme then that obtained for every last one of your simulated datasets, then the P-value will be entirely determined by the number of simulations that are used (as Luke says). This seems to be case for your data (not surprising given the very large values for F that were obtained). - Liam Liam J. Revell NESCent, Duke University web: http://anolis.oeb.harvard.edu/~liam/ NEW email: lrev...@nescent.org Luke Harmon wrote: Yes that's a direct result of the number of simulations - if all of the simulated F statistics are smaller than the test statistics, then you will get: p = 1/(n+1) where n is the number of simulated data sets. lh On Jul 26, 2010, at 8:44 AM, Alejandro Gonzalez V wrote: Hello, Some colleagues and I are running some phylogenetic ANOVAS using the geiger package. In some of the analyses we get the same phylogentic p-value (very small p-value) even though the F-statistic differs between the two analyses, albeit it being relatively high in both instances. We were wondering why this arises, to get better grip on how the analysis works. We thought it may have to do with the randomizations to calculate the phylogenetic p-value. Or that the F-statistics are quite high... Below are two examples : m11-phy.anova(tree1,tmax,biozone,data.names=X,nsim=1000) Standard ANOVA: Analysis of Variance Table Response: td$data Df Sum Sq Mean Sq F valuePr(F) group 1 967.96 967.96 155.88 3.057e-12 *** Residuals 25 155.246.21 --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Phylogenetic p-value: 0.000999001 m12-phy.anova(tree1,wt,biozone,data.names=X,nsim=1000) Standard ANOVA: Analysis of Variance Table Response: td$data Df Sum Sq Mean Sq F valuePr(F) group 1 602.88 602.88 109.01 1.333e-10 *** Residuals 25 138.265.53 --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Phylogenetic p-value: 0.000999001 Cheers, Alejandro __ Alejandro Gonzalez Voyer Post-doc NEW ADDRESS NEW E-MAIL Estación Biológica de Doñana (CSIC) Avenida Américo Vespucio s/n 41092 Sevilla Spain E-mail: alejandro.gonza...@ebd.csic.es Tel: +34- 954 466700, ext 1749 Website (From my previous position): http://www.iee.uu.se/zooekol/default.php?type=personalpagelang=enid=146 [[alternative HTML version deleted]] ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Luke Harmon Assistant Professor Biological Sciences University of Idaho 208-885-0346 lu...@uidaho.edu ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Re: [R-sig-phylo] Question regarding simulate character option in geiger
Hi Alejandro. Your code won't work because model.matrix in sim.char() is the variance-covariance matrix for the traits, not the species. What you can do instead is the following: cp-corMartins(2.5,tree,fixed=T) C-corMatrix(Initialize(cp,data)) rownames(C)-tree$tip.label # naming the rows and columns ensures that the data vector will have labeled rows colnames(C)-rownames(C) x-chol(C)%*%rnorm(length(tree$tip.label)) Or alternatively you can do the following using sim.char(): s-matrix(1,1,1) x-sim.char(ouTree(tree,alpha=2.5),model.matrix=s,model=brownian) # for instance There are also several other related transformations, such as lambdaTree() and deltaTree. These can be reviewed by querying: ?ouTree Hope this is of some help. Sincerely, Liam Liam J. Revell NESCent, Duke University web: http://anolis.oeb.harvard.edu/~liam/ NEW email: lrev...@nescent.org Alejandro Gonzalez V wrote: Hello, I have a question about the simulate character command in geiger. I want to generate simulations of trait evolution under different models to estimate variance around the estimate of evolutionary parameters for that trait. I've seen in previous posts in the list that characters can be simulated under distinct evolutionary models by transforming the phylogenetic tree prior to the simulation. I was wondering if the same could be done by instead adjusting the model matrix. Specifically I was wondering if ape's corPagel or corMartins could be used to specify different matrices adjusted to distinct values of lambda or alpha instead of transforming the tree. Would this be an appropriate alternative to tree transformation prior to simulations? For example I was thinking of doing something of this sort for a model with alpha of 2.5 with a phylo object called tree and table with a trait called data: cp-corMartins(2.5, tree, fixed=TRUE) C-corMatrix(Initialize(cp, data)) sim.char(tree, model.matrix=C, nsims=100, model=brownian) Cheers, Alejandro __ Alejandro Gonzalez Voyer Post-doc NEW ADDRESS NEW E-MAIL Estación Biológica de Doñana (CSIC) Avenida Américo Vespucio s/n 41092 Sevilla Spain E-mail: alejandro.gonza...@ebd.csic.es Tel: +34- 954 466700, ext 1749 Website (From my previous position): http://www.iee.uu.se/zooekol/default.php?type=personalpagelang=enid=146 [[alternative HTML version deleted]] ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Re: [R-sig-phylo] retrieving simulated ancestral states from sim.char{geiger}
Hi all. Here is some code that I think will do what Dr. Felsenstein has suggested for a single character. It returns a vector containing the simulated states at all internal or terminal nodes, numbered according to the numbers used in tree$edge. Happy to hear if this works. See below: # conducts BM simulation by ascending through the tree from the root node to the tips # by Liam J. Revell sim.bm-function(tree,bm.rate,root.state){ # generate random normal deviates dev-rnorm(n=length(tree$edge.length)) # give them appropriate variances dev-dev*sqrt(bm.rate*tree$edge.length) # sort the tree edges, edge.lengths, and devs ranks-order(tree$edge[,1]) tree$edge-tree$edge[ranks,] tree$edge.length-tree$edge.length[ranks] dev-dev[ranks] # add them up state-matrix(0,length(tree$edge.length),1); for(j in 1:length(tree$edge.length)){ if(tree$edge[j,1]==length(tree$tip)+1) state[j]-root.state+dev[j] else state[j]-state[match(tree$edge[j,1],tree$edge[,2])]+dev[j] } # bookkeeping rownames(state)-tree$edge[,2] state-state[order(as.numeric(rownames(state))),] sim.bm-state } Sincerely, Liam Liam J. Revell NESCent, Duke University web: http://anolis.oeb.harvard.edu/~liam/ NEW email: lrev...@nescent.org Joe Felsenstein wrote: Luke wrote: The simulations in Geiger do draw from a mvn distribution but - in an unnecessary step - draw random numbers for every branch and use matrix multiplication to get the tip values. This is kind of dumb and needlessly slow but does give you the right answer. Carl is correct that simply drawing from a mvn distribution is faster and equivalent. If I had time I'd rewrite the function - maybe I will! Without knowing exactly what your R code is doing, I just want to add to these (correct) ideas that there is one simple way of getting tip and interior node values without much more effort. Work up the tree, drawing a set of normals (one for each character) to get the net changes in that branch. Keep going until you reach the tips. (If the characters are not independent, you also need to have on hand a matrix which contains the square root of the covariances among characters, and multiply the values at each node by that.) This way you don't need a nodes x nodes covariance matrix. Joe Joe Felsenstein j...@gs.washington.edu Department of Genome Sciences and Department of Biology, University of Washington, Box 355065, Seattle, WA 98195-5065 USA ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Re: [R-sig-phylo] Plotting trees - 'main' does not seem to work
Hi Steve, I have found the same thing with plot.phylo(). The same effect can be accomplished by the following two commands: plot(tree) # tree is your phylo object title(desired title) # desired title is what you would normally give to main You may have already figured this out, but in case not - I thought I would pass it along. - Liam Liam J. Revell NESCent, Duke University web: http://anolis.oeb.harvard.edu/~liam/ email: lrev...@nescent.org, (new) liam.rev...@umb.edu (new) blog: http://phytools.blogspot.com On 12/29/2010 12:42 PM, Steve L wrote: I've tried plotting the example trees in the ape package as well as my own using the 'main' parameter to set the title of my plots, but no matter what I set the 'main' parameter to, it doesn't print any text above the plot. Is this a bug? Steve [[alternative HTML version deleted]] ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo [[alternative HTML version deleted]] ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Re: [R-sig-phylo] Cant read PHYLIP tree
This makes sense as I was also able to read in the file using read.tree() with no difficulty running in Windows. - Liam On 1/21/2011 10:46 AM, Leonor Palmeira wrote: On 21/01/11 16:39, Chris Mcowen wrote: Thanks for this, it works now. I wonder how that got there? It isn't in the nexus format? Very odd! Also, Enrico Rezende Chris, FYI, I opened your file in R with the read.tree command and had no problems to visualise your tree. tree- read.tree(tree.txt) tree Phylogenetic tree with 1325 tips and 1248 internal nodes. Tip labels: Acanthochlamys_bracteata, Barbacenia_albiflora, Barbacenia_pallida, Barbacenia_riedeliana, Barbacenia_trigona, Freycinetia_awaiarensis, ... Rooted; includes branch lengths. Enrico How could he open it in this format? It might be related to the system under which you're running R? I am running it under Linux and have the same error as you. Enrico might be running it under Windows, which might ignore this character. Sorry that I don't have any more insight on this matter. Leo. Thanks On 21 Jan 2011, at 15:36, Leonor Palmeira wrote: Hi, this is just due to a unallowed character in your file in Ptychosperma_harannii. If you change this, it should run fine. Leo. On 21/01/11 16:00, Chris Mcowen wrote: Hi, I am using a function that requires a tree in PHYLIP format. I had it in nexus ( which read in fine using the read.nexus function) i then opened it in figtree and output it as phylip. However when i try and open it i get this error: read.tree(/Users/chrismcowen/Desktop/tree.txt) NULL Warning message: In strsplit(tree, NULL) : input string 1 is invalid in this locale I can open it in figtree fine, so i am unsure why i cant in R? any help would be greatly appreciated. p.s - the tree is attached. Chris ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo -- Liam J. Revell NESCent, Duke University web: http://anolis.oeb.harvard.edu/~liam/ NEW email: lrev...@nescent.org [[alternative HTML version deleted]] ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Re: [R-sig-phylo] R: Re: R: Re: Simulation of Continuous Trait with Active Trend
Hi Pasquale, I think this may be possible if all tips are not contemporaneous. As Gene points out, the values at the tips of the tree given a trend should be distributed as ~MVN(mu=diag(vcv(tree))*tr+ancestor, V=vcv(tree)*sig2). Thus, we should be able to compute the likelihood of any data pattern at the tips and internal nodes of the tree, given tr, ancestor, and sig2. If we start with only the values at the tips and the phylogeny, we can estimate the states at internal nodes and our model by maximizing the likelihood. This would be our solution. I have written down some code for this using dmnorm() (from the mnormt package) to compute the multivariate normal density; a home-made function to compute vcv(tree) but for all the nodes in the tree; and optim() to maximize the likelihood - but it doesn't seem to work very well yet. I will pass it along if I can figure it out. - Liam -- Liam J. Revell University of Massachusetts Boston web: http://www.bio.umb.edu/facstaff/faculty_Revell.html (new) email: liam.rev...@umb.edu (new) blog: http://phytools.blogspot.com On 2/17/2011 12:22 PM, pasquale.r...@libero.it wrote: Hi all, to feed Dave's questions; I have body size data (showing high phylogenetic signal) and a trend verified independently (Cope's). Although all species in the tree are in fact fossil species, I derived from it a number of trees which are time-interval phylogenies, restricted to species living in a particular temporal interval. As such, interval phylogenies are ultrametric. I'm trying ape's yule.cov function to test whether body size did influence diversification, yet yule.cov requires aces which are calculated by assuming brownian motion, which is wrong given I know there is a real trend in the data. This is why I asked if it is possible to calculate aces by assuming BM with a positive (mu0) trend. Any hint? Pas Messaggio originale Da: dwba...@uchicago.edu Data: 17/02/2011 18.04 A: pasquale.r...@libero.itpasquale.r...@libero.it, R Sig Phylo Listservr-sig-phylo@r-project.org, Liam J. Revellliam.rev...@umb.edu, Gene Hunthu...@si.edu Ogg: Re: [R-sig-phylo] R: Re: Simulation of Continuous Trait with Active Trend All- Thanks for all the alternative ways to simulate trends! I'd say estimating ancestral traits always depends on the question you are after. If we are dealing with traits that may be evolving under an active trend, than we know reconstruction could be inaccurate because ancestral states are simply a weighted average of the observed values. Whether that invalidates ones approach depends on the question, how one deals with uncertainty in the ancestral trait estimates and the data itself (i.e. How many fossil taxa do you have? Do you have good a priori evidence for a particular root state? Does the trait have a high or low level of phylogenetic signal, as this influences the uncertainty?). I think Finarelli and Flynn have a good point that including a large sample of fossil taxa can help quite a bit in constraining our understanding of active trend patterns, but I think we (as paleontologists) need to give a great deal of consideration to how we want to treat lineages in the fossil record (as internal nodes or as terminal taxa) and how we time-scale our trees. -Dave On Thu, Feb 17, 2011 at 6:19 AM, pasquale.r...@libero.it mailto:pasquale.r...@libero.it pasquale.r...@libero.it mailto:pasquale.r...@libero.it wrote: Hi all, I have a simple question. Is it possible, or even feasible, to run ancestral state estimation under Brownian Motion with a trend, as modelled by Emmanuel, Gene and Liam? I wonder whether this is feasible with real data. Thanks all Pas Messaggio originale Da: emmanuel.para...@ird.fr mailto:emmanuel.para...@ird.fr Data: 17/02/2011 6.13 A: Hunt, Genehu...@si.edu mailto:hu...@si.edu Cc: R Sig Phylo Listservr-sig-phylo@r-project.org mailto:r-sig-phylo@r-project.org Ogg: Re: [R-sig-phylo] Simulation of Continuous Trait with Active Trend Hi all, In rTraitCont, the argument model can be a function, eg: f - function(x, l) x + rnorm(1, 1, sqrt(l * 0.1)) which is a way to model a trend. Here's an example of what you can do: tr - rbdtree(.1, .05) tr - makeNodeLabel(tr) x - rTraitCont(tr, model = f, ancestor = TRUE) bt - branching.times(tr) plot(-bt, x[names(bt)]) You may add the tip values with: n - Ntip(tr) points(rep(0, n), x[1:n], pch = 2) You can also draw lines linking the ancestors with their descendants in this morpho-time-space since they can
Re: [R-sig-phylo] R: Re: R: Re: Simulation of Continuous Trait with Active Trend
Hi Pasquale, To follow up on my earlier message, I have written a preliminary version of a simple function to simultaneously estimate the evolutionary parameters and ancestral states for Brownian evolution with a trend using likelihood. I will paste it at the end of this email. I have confirmed that the function returns the same trend parameter as fitContinuous(model=trend) and the same ancestral character estimates as ace() - when mu (the trend parameter) is forced to zero [note that this can only be done by modifying the code]. The parameter sig2, the Brownian rate, is biased by a factor n/(2n-2) for n species relative to fitContinuous(model=trend)$Trait1$beta. Note that this model can generally only be fit for a tree with tips that are not contemporaneous - for instance, for a phylogeny containing fossil species. Best, Liam Function code (also http://anolis.oeb.harvard.edu/~liam/R-phylogenetics/anc.trend/v0.1/anc.trend.R): # written by Liam J. Revell 2011 anc.trend-function(phy,x,maxit=2000){ # preliminaries # require dependencies require(ape) # set global tol-1e-8 # compute C D-dist.nodes(phy) ntips-length(phy$tip.label) Cii-D[ntips+1,] C-D; C[,]-0 counts-vector() for(i in 1:nrow(D)) for(j in 1:ncol(D)) C[i,j]-(Cii[i]+Cii[j]-D[i,j])/2 dimnames(C)[[1]][1:length(phy$tip)]-phy$tip.label dimnames(C)[[2]][1:length(phy$tip)]-phy$tip.label C-C[c(1:ntips,(ntips+2):nrow(C)),c(1:ntips,(ntips+2):ncol(C))] # sort x by phy$tip.label x-x[phy$tip.label] # function returns the negative log-likelihood likelihood-function(theta,x,C){ a-theta[1] u-theta[2] sig2-theta[3] y-theta[4:length(theta)] logLik-dmnorm(x=c(x,y),mean=(a+diag(C)*u),varcov=sig2*C,log=TRUE) return(-logLik) } # get reasonable starting values for the optimizer a-mean(x) sig2-var(x)/max(C) # perform ML optimization result-optim(par=c(a,0,sig2,rep(a,phy$Nnode-1)),likelihood,x=x,C=C,method=L-BFGS-B,lower=c(-Inf,-Inf,tol,rep(-Inf,phy$Nnode-1)),control=list(maxit=maxit)) # return the result ace-c(result$par[c(1,4:length(result$par))]); names(ace)-c(as.character(phy$edge[1,1]),rownames(C)[(length(phy$tip.label)+1):nrow(C)]) return(list(ace=ace,mu=result$par[2],sig2=result$par[3],logL=-result$value,convergence=result$convergence,message=result$message)) } -- Liam J. Revell University of Massachusetts Boston web: http://www.bio.umb.edu/facstaff/faculty_Revell.html (new) email: liam.rev...@umb.edu (new) blog: http://phytools.blogspot.com On 2/17/2011 12:45 PM, Liam J. Revell wrote: Hi Pasquale, I think this may be possible if all tips are not contemporaneous. As Gene points out, the values at the tips of the tree given a trend should be distributed as ~MVN(mu=diag(vcv(tree))*tr+ancestor, V=vcv(tree)*sig2). Thus, we should be able to compute the likelihood of any data pattern at the tips and internal nodes of the tree, given tr, ancestor, and sig2. If we start with only the values at the tips and the phylogeny, we can estimate the states at internal nodes and our model by maximizing the likelihood. This would be our solution. I have written down some code for this using dmnorm() (from the mnormt package) to compute the multivariate normal density; a home-made function to compute vcv(tree) but for all the nodes in the tree; and optim() to maximize the likelihood - but it doesn't seem to work very well yet. I will pass it along if I can figure it out. - Liam ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Re: [R-sig-phylo] PGLS, polytomies, degrees of freedom and multiple specimens
Hi Popko, My sense is that if the polytomy is hard, it is never necessary to subtract a degree of freedom. If the polytomy is firm (sensu Purvis Garland 1993) - in that the true tree is fully bifurcating but we cannot resolve a multifurcating node due primarily to rapid divergence of the descendant taxa - then subtracting a degree of freedom will probably be excessively conservative. If the polytomy is truly soft (sensu Maddison 1989; Garland Diaz-Uriarte 1999), then subtracting a degree of freedom for each extra furc will probably also be conservative. [As Rohlf 2001 points out, this would imply zero degrees of freedom if the phylogeny was completely unresolved (p. 2154), which seems excessive.] Some of these issues are discussed in Garland Diaz-Uriarte (1999; Syst. Biol.) and elsewhere. I'm sure that other subscribers to the list could add significant insight to this issue beyond what I am able to offer. - Liam -- Liam J. Revell University of Massachusetts Boston web: http://www.bio.umb.edu/facstaff/faculty_Revell.html (new) email: liam.rev...@umb.edu (new) blog: http://phytools.blogspot.com On 2/22/2011 5:31 AM, Popko Wiersma wrote: Dear all, Can somebody tell whether one should subtract degrees of freedom when applying PGLS with a phylogeny containing soft polytomies? And does it make a difference if polytomies originate from data from multiple specimens of species? cheers, Popko Wiersma - I reposted this message because the first attempt did not show the text directly in the message body. Apologies for any confusion - ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Re: [R-sig-phylo] Problems with transferring trees back into ape from Mesquite
To collapse nodes you could use the function di2multi() in ape which collapses all internodes with a length below tol (specified by the user). If your tree doesn't have branch lengths, you could just set all the edge lengths of the edges you want to keep to 1, and all the other ones to 0, and then use di2multi(). - Liam -- Liam J. Revell University of Massachusetts Boston web: http://faculty.umb.edu/liam.revell/ (new) email: liam.rev...@umb.edu (new) blog: http://phytools.blogspot.com On 3/3/2011 3:12 PM, sd...@duke.edu wrote: I have had similar challenges with going back and forth between Macclade/Mesquite and R and FigTree. I got similar messages about the root and I only remember that I ended up using PAUP and TreeEdit to solve the issues. I wish I could remember more specifics but I did root and unroot the thing and try exporting as different file types. Eventually one of the programs reformatted the newick file and fixed the root issue so that it worked for FigTree and other downstream programs (I think the tree was destined for raxML to be used as a constraint tree). But I'm surprised R doesn't have functions for collapsing nodes.. Stacey Quoting David Bapst dwba...@uchicago.edu: Hello all, I recently decided to manually edit a very large supertree (1700 taxa) in Mesquite (I had to collapse a few nodes into polytomies, something I can only seem to do in Mesquite). I read it out of R as a Newick file, than into Mesquite, where I did my little edits and then saved the tree as a Nexus file, which I then tried to read into ape. But then I get this. spec_tree-read.nexus(file.choose()) Error in if (sum(tr$edge[, 1] == ROOT) == 1 dim(tr$edge)[1] 1) { : missing value where TRUE/FALSE needed I also tried reading the Nexus file into TreeFig, which didn't work. If I cut out a lot of the internal structure of the Nexus file (who needs Mesquite and Taxon blocks anyway?), I can get TreeFig to read it, export it into a Newick file or a new Nexus file and then try opening it in R with read.nexus(). That works. But then i can't write it in a new file in R with write.tree() nor write.nexus(), and also plot(tree) causes R to freeze. write.tree(spec_id_tree,file.choose()) Error in kids[[parent[i]]] : attempt to select less than one element write.nexus(spec_id_tree,file.choose()) #NEXUS [R-package APE, Mon Feb 28 10:03:59 2011] Read 1756 items Error in 1:(start - 1) : argument of length 0 After saving and opening the file in Mesquite a bunch of times, I repeatedly found new un-collapsed double node branches on the tree. No idea where they came from (maybe a by-product of my node collapsing?). I removed the new ones that popped up, deassigned branch lengths to the tree (it's a super-cladogram) and then saved and... finally, R read the nexus file and write.tree() worked. write.nexus() didn't work (same error as above), but I can read my tree and write it, at least, so my issue is solved but I'd still like to know what was causing the problems to begin with. I have no idea where the un-collapsed double branches came from or why they were causing this issue. I am using R v2.12.2 and ape v2.6-3. So that the error is completely reproducible, I have placed two Nexus files on my website which produce both the read.nexus() error (trash_a.nex) and the write.nexus() and write.tree() errors (trash_b). These are small, partially collapsed portions of my supertree with the species names deleted. They can be found at: http://home.uchicago.edu/~dwbapst/trash_a.nex http://home.uchicago.edu/~dwbapst/trash_b.nex -Dave -- David Bapst Dept of Geophysical Sciences University of Chicago 5734 S. Ellis Chicago, IL 60637 http://home.uchicago.edu/~dwbapst/ ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Re: [R-sig-phylo] Pruning a tree
Hi Eugen, Did you try my suggestion? In your case, if the species you want to keep are in a row separated text file, with a header (species), first read them in: species.to.keep-read.table(file=species.list.file,header=T) Now, for phylo object tree, type: pruned.tree-drop.tip(tree,tree$tip.label[-match(species.to.keep[,1], tree$tip.label)]) - Liam -- Liam J. Revell University of Massachusetts Boston web: http://faculty.umb.edu/liam.revell/ (new) email: liam.rev...@umb.edu (new) blog: http://phytools.blogspot.com On 3/4/2011 6:52 AM, Eugen Egorov wrote: First of all, thank you for helping :) Well, this didn't work out, although it should work perfectly. When I type in nc - name.check(tree, data) and then nc I get all tree species (over 3000 species) in the $Tree.not.data part. In the $Data.not tree part I see 281 numbers from 1 to 281, this is the number of species in my data file. Seems like R doesnt recognize the species names in the data file. The data file is a .csv file and looks like this: species Species_1 Speceis_2 Species_3 Species_281 so its one column and nothing else. What should I do? Thanks a lot - Original Message - From: Graham Slater gsla...@ucla.edu To: Eugen Egorov eugen...@online.de Cc: r-sig-phylo@r-project.org Sent: Thursday, March 03, 2011 7:54 PM Subject: Re: [R-sig-phylo] Pruning a tree drop.tip assumes you have identified the tips that you want to remove, which you could do using nc - name.check(tree, data). newtree - drop.tip(tree, nc[[1]]) or newtree - drop.tip(tree, nc$Tree.not.data) # note that Tree has a caps and not using this could cause a weird tree with no tips to be output. But if your data are simply a list of names (and I assume here you mean you have a vector of names rather than an actual list) and you don't know exactly which species are missing then the following might be easier: missing - tree$tip.label[is.na(match(tree$tip.label, listofnames))] ## will use the match() function to identify the tips that are not present in your list - it essentially is trying to match the tip names from the tree to your list and if there is no match it reports NA for that tip. we get the tip names corresponding to those NAs here newtree - drop.tip(tree, missing) # this will remove those tips graham On Mar 3, 2011, at 9:15 AM, Eugen Egorov wrote: Hi all, I have a huge tree and a list with species. Now I want to prune the tree, so only species appearing in the list are left in the tree. I tried the geiger package to compare tree species with those in the list, but that didn't work out, because I recieved a tree with 0 tips and 1 node after using drop.tip$tree.not.data. I guess I have to format the list, but I don't know in which way. Any idea how I do that? greets Eugen [[alternative HTML version deleted]] ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Graham Slater Ph. D. Department of Ecology and Evolutionary Biology University of California, Los Angeles 610 Charles E. Young Drive South Los Angeles, CA 90095-1606 (424) 442-4348 gsla...@ucla.edu www.eeb.ucla.edu/gslater ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Re: [R-sig-phylo] Pruning a tree
Hi Eugen. name.check() uses the row names of your data matrix, so that's why it works now. The error message that was returned by the -match() method suggests that you have species in your to include file that are not actually in the tree (thus match() returns one or multiple NAs). I did not anticipate this. To get around it, just do: pruned.tree-drop.tip(tree,tree$tip.label[-na.omit(match(species.to.keep[,1],tree$tip.label))]) - Liam -- Liam J. Revell University of Massachusetts Boston web: http://faculty.umb.edu/liam.revell/ (new) email: liam.rev...@umb.edu (new) blog: http://phytools.blogspot.com On 3/4/2011 1:08 PM, Eugen Egorov wrote: Well, Liams suggestion produced an error message only zeros are allowed to be mixed with negative indices. I just tried a bunch of command lines from various internet sites and came up with what seems to be a solution. Here the skript: tree-read.tree(tree.tre) data-read.csv(data.csv) data.frame(data[,1])-data1 rownames(data1)-data[,1] attach(data1) name.check(tree, data1) - overlap comptree-drop.tip(tree, overlap$Tree.not.data) ...and this works, but I don't know why exactly...seems like rownames is an imprtant factor, cause it doesn't work without that command line...for whatever reason... Does that make any sense? - Original Message - From: Graham Slater gsla...@ucla.edu To: Liam J. Revell liam.rev...@umb.edu Cc: Eugen Egorov eugen...@online.de; r-sig-phylo@r-project.org Sent: Friday, March 04, 2011 6:33 PM Subject: Re: [R-sig-phylo] Pruning a tree Yes, name.check() will only work with a named vector of data or data frame, so Liam's code should work for you. it also looks like your names in the vector species are different from those of the tip labels, as you say that nc$Tree.not.data gives you a bunch of numbers but your vector is made of Species underscore number, e.g. Species_1. If that's the case, either change your file of species names to be just numbers, or use the following to change your tree tip labels before using liam's code. tree$tip.label - paste(Species, tree$tip.label, sep = _) Graham Slater Department of Ecology and Evolutionary Biology University of California, Los Angeles 621 Charles E Young Drive South Los Angeles CA 90095-1606 (310) 825-4669 gsla...@ucla.edu www.eeb.ucla.edu/gslater On Mar 4, 2011, at 7:45 AM, Liam J. Revell wrote: Hi Eugen, Did you try my suggestion? In your case, if the species you want to keep are in a row separated text file, with a header (species), first read them in: species.to.keep-read.table(file=species.list.file,header=T) Now, for phylo object tree, type: pruned.tree-drop.tip(tree,tree$tip.label[-match(species.to.keep[,1], tree$tip.label)]) - Liam -- Liam J. Revell University of Massachusetts Boston web: http://faculty.umb.edu/liam.revell/ (new) email: liam.rev...@umb.edu (new) blog: http://phytools.blogspot.com On 3/4/2011 6:52 AM, Eugen Egorov wrote: First of all, thank you for helping :) Well, this didn't work out, although it should work perfectly. When I type in nc - name.check(tree, data) and then nc I get all tree species (over 3000 species) in the $Tree.not.data part. In the $Data.not tree part I see 281 numbers from 1 to 281, this is the number of species in my data file. Seems like R doesnt recognize the species names in the data file. The data file is a .csv file and looks like this: species Species_1 Speceis_2 Species_3 Species_281 so its one column and nothing else. What should I do? Thanks a lot - Original Message - From: Graham Slater gsla...@ucla.edu To: Eugen Egorov eugen...@online.de Cc: r-sig-phylo@r-project.org Sent: Thursday, March 03, 2011 7:54 PM Subject: Re: [R-sig-phylo] Pruning a tree drop.tip assumes you have identified the tips that you want to remove, which you could do using nc - name.check(tree, data). newtree - drop.tip(tree, nc[[1]]) or newtree - drop.tip(tree, nc$Tree.not.data) # note that Tree has a caps and not using this could cause a weird tree with no tips to be output. But if your data are simply a list of names (and I assume here you mean you have a vector of names rather than an actual list) and you don't know exactly which species are missing then the following might be easier: missing - tree$tip.label[is.na(match(tree$tip.label, listofnames))] ## will use the match() function to identify the tips that are not present in your list - it essentially is trying to match the tip names from the tree to your list and if there is no match it reports NA for that tip. we get the tip names corresponding to those NAs here newtree - drop.tip(tree, missing) # this will remove those tips graham On Mar 3, 2011, at 9:15 AM, Eugen Egorov wrote: Hi all, I have a huge tree and a list with species. Now I want to prune the tree, so only species appearing in the list are left in the tree. I tried the geiger package to compare tree species
Re: [R-sig-phylo] Dealing with Bounded Trait Measures
I'm not sure that this is what Dave has in mind, but if anyone is interested in simulating bounded evolution in R, I just added it to my fastBM() function (code here: http://anolis.oeb.harvard.edu/~liam/R-phylogenetics/fastBM/v0.3/fastBM.R). In the process of evolving traits up the tree, I just bounce back any phenotypes that exceed the lower or upper boundary conditions specified by the user (by default they are -Inf and Inf). I think I did this properly. Feedback welcome though. - Liam -- Liam J. Revell University of Massachusetts Boston web: http://faculty.umb.edu/liam.revell/ (new) email: liam.rev...@umb.edu (new) blog: http://phytools.blogspot.com On 3/5/2011 12:55 PM, tgarl...@ucr.edu wrote: Hello David, Enrico, et al., I may have lost track of what Dave was originally trying to do, and I am not familiar with all of the options presently available in r for simulating continuously valued traits along a specified phylogenetic tree. However, I wanted to point out that MANY possibilities, including trends, the OU process, and actual limits to trait evolution implemented in several ways, are available in our original DOS program PDSIMUL.EXE that accompanies this paper: Garland, T., Jr., A. W. Dickerman, C. M. Janis, and J. A. Jones. 1993. Phylogenetic analysis of covariance by computer simulation. Systematic Biology 42:265-292. It has been used many times to look at trends, limits, etc., e.g., in these papers: Díaz-Uriarte, R., and T. Garland, Jr. 1996. Testing hypotheses of correlated evolution using phylogenetically independent contrasts: sensitivity to deviations from Brownian motion. Systematic Biology 45:27-47. Laurin, M. 2010. Assessment of the relative merits of a few methods to detect evolutionary trends. Syst. Biol. 59:689-704. Cheers, Ted Theodore Garland, Jr. Professor Department of Biology University of California, Riverside Riverside, CA 92521 Office Phone: (951) 827-3524 Lab Phone: (951) 827-5724 Home Phone: (951) 328-0820 Facsimile: (951) 827-4286 = Dept. office (not confidential) Email: tgarl...@ucr.edu Main Departmental page: http://www.biology.ucr.edu/people/faculty/Garland.html List of all Publications: http://www.biology.ucr.edu/people/faculty/Garland/GarlandPublications.html Garland and Rose, 2009 http://www.ucpress.edu/books/pages/10604.php Original message Date: Sat, 05 Mar 2011 15:36:13 +0100 From: Enrico Rezendeenrico.reze...@uab.cat Subject: Re: [R-sig-phylo] Dealing with Bounded Trait Measures To: David Bapstdwba...@uchicago.edu Cc: R Sig Phylo Listservr-sig-phylo@r-project.org David, on the top of my head, if no species measurement strictly corresponds to zero, you may log-transform the data. You may then simulate Brownian motion in log-transformed values, which will correspond to a boundary of zero in a linear scale (i.e., the more negative the log number, the closer the trait value is to zero - but never zero - in a linear scale). This also explains why you can simulate the evolution of body mass employing Brownian motion in log-transformed units and no species will ever be assigned a body mass of zero. On more speculative grounds, this may simply reflect the fact that many biological processes and their regulation occur in a multiplicative, not additive, scale. The problem with regards to this approach is that you cannot really have any species with a trait = 0 given that the log-transformation is impossible in this case, so you might add some constant in case this occurs (caution because the constant would be arbitrary and might have an impact on the outcome of analyses). Did not think about this for too long, though. Hope this helps, Enrico El 4/3/11 9:14 p.m., David Bapst escribió: All- As far as I understand it, the vast majority of continuous character analyses assume that the trait is distributed normally and without bounds. Is there an appropriate transformation to for measurements of a trait that does have one or more bounds and where some taxa actually are at that bound? I have several traits where the bound is zero, and some taxa are actually at zero for this trait. (A practical example is 'spine length', where some taxa have virtually no spine.) And if there is no transformation applicable, is it analytically appropriate to remove taxa that have 'zero units' for that trait? Must we convert these traits to discrete categories to deal with them at all? As always, I appreciate your advice. -Dave Bapst, UChicago -- Enrico L. Rezende
Re: [R-sig-phylo] The Speed of Different Methods For Obtaining the Descendant Tip Taxa per Node
Hi Dave. It seems like one way to get these values faster would be to count the number of descendants as the tree is read in to R. This is possible because when the ) character is reached by the Newick parser, all descendant nodes (and tips) have already been created in memory. This was simple to add to my tree reading function read.newick() [code here: http://anolis.oeb.harvard.edu/~liam/R-phylogenetics/read.newick/v0.3/read.newick.R]. Now v0.3 of the function returns a modified phylo object with the vector $Ndesc containing the number of descendants from each internal and terminal node above the root (in the order of $edge[,2]). This is exactly the same vector as would be obtained by the following commands: desc-sapply(tree$edge[,2],function(x) node.leaves(tree,x)) Ndesc-sapply(desc,function(x) length(x)) Note that read.newick() only reads one tree at a time in phylip format (and without node labels) - but reading multiple trees or node labels, or reading from nexus format, would quite easy to add if you find that this is indeed faster. In addition, I believe that newick2phylog() creates this list in reading a Newick tree into memory as a phylog object. This is stored as phy$Aparam$nlea for phylog object phy. Unfortunately, it seems like newick2phylog() is very slow for large trees. - Liam -- Liam J. Revell University of Massachusetts Boston web: http://faculty.umb.edu/liam.revell/ email: liam.rev...@umb.edu blog: http://phytools.blogspot.com On 3/6/2011 12:47 PM, David Bapst wrote: Hello all, I'm currently trying to measure a parameter over a large number of large trees (1700 tips), and part of this calculation requires knowing the tip taxa descended from each node (I must compare the difference in tip values among taxa descended from a node). Because I must do this many times, I decided to compare the efficiency of several methods for doing this in various R libraries with system.time() (as Liam did recently with some BM simulation functions in some recent blog posts). As I feel that others may benefit from this comparison of methods, I am posting the results to this list. Geiger's node.leaves() is much faster than the alternatives, although at ~13s a run, it is still not a particularly speedy process. I didn't need the actual tip.labels, so I took node.leaves() and made it as lean as possible, to produce node.tips(), below. That cut the run time down to ~9 sec. node.tips-function (phy, node){ n- length(phy$tip.label) if (node= n){node}else{ l- numeric() d- phy$edge[which(phy$edge[,1]==node),2] for(j in d){if(j= n){l- c(l, j)}else{l-c(l, node.tips(phy,j))}} l}} If anyone knows of another alternative that might be faster, I would greatly appreciate any suggestions. -Dave Bapst, UChicago Geosci Using the modified node.leaves() function from geiger, node.tips() system.time(desc-sapply(edge_end,function(x) node.tips(res_tree,x))) user system elapsed 9.390.009.44 Using node.leaves() in geiger system.time(desc-sapply(edge_end,function(x) node.leaves(res_tree,x))) user system elapsed 13.290.02 13.34 Using Descendants() in phangorn system.time(desc-sapply(edge_end,function(x) Descendants(res_tree,x))) user system elapsed 75.600.10 76.83 Using listTips() in adephylo system.time(desc_list-c(as.list(1:Ntip(res_tree)),listTips(res_tree))) user system elapsed 75.272.93 87.28 Using descendants() in phylobase system.time(desc-sapply(edge_end,function(x) descendants(res_tree4,x,type=tips))) user system elapsed 149.560.67 155.15 Using nodeDecendants() in maticce (note that translating the tree into ouchtree format was prohibitively very lengthy) system.time(res_tree_ou-ape2ouch(res_tree)) user system elapsed 84.010.13 85.34 system.time(desc-sapply(edge_end,function(x) nodeDescendents(res_tree_ou,x))) user system elapsed 65.910.23 68.47 ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Re: [R-sig-phylo] reverse order plotting of newick tree/phylo object
Hi Thierry, There might be a more elegant way to do this, but you can just apply the ape function rotate() to each node number of the tree (excluding tips). I.e. tr2-tree for(i in length(tr2$tip)+1:tr2$Nnode) tr2-rotate(tr2,i) plot(tr2) [rotate() may also be able to take a vector of nodes, but I was not able to get this to work.] - Liam -- Liam J. Revell University of Massachusetts Boston web: http://faculty.umb.edu/liam.revell/ email: liam.rev...@umb.edu blog: http://phytools.blogspot.com On 3/17/2011 10:51 AM, Thierry Janssens - TNW wrote: Dear R-sig-phylo, I am looking for a method to plot an unrooted tre/phylo object e in the reverse order (of the tip labels). Like all the nodes would have rotated. Any of you has an idea? Kind regards, Thierry Thierry Janssens Postdoctoral researcher Delft University of Technology Bionanoscience Kavli Institute of Nanoscience Lorentzweg 1 2628LJ Delft the Netherlands Tel: +31 15 2781175 Fax:+31 15 2781202 e-mail: t.k.s.janss...@tudelft.nlmailto:t.k.s.janss...@tudelft.nl [[alternative HTML version deleted]] ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Re: [R-sig-phylo] reverse order plotting of newick tree/phylo object
Dan is right on - and I also suspect that the issue of the non-rotating node is probably due to a polytomy at that node. Thanks for the insight Dan. - Liam -- Liam J. Revell University of Massachusetts Boston web: http://faculty.umb.edu/liam.revell/ email: liam.rev...@umb.edu blog: http://phytools.blogspot.com On 3/17/2011 12:26 PM, Dan Rabosky wrote: But can you explain to me what is the rationale behind this? There are only 47 nodes and 54 tips. How can the nodes from 55 to 47 than be rotated? Liam's code is rotating nodes 55+(1:47), rather than 55 to 47. The internal node indexing in ape's phylo class starts with number of tips plus 1. Thus, using rotate on nodes starting with length(phy$tip.label) + 1 means it is starting with node 56 (the root node) and going through length(phy$tip.label) + phy$Nnode, e.g.,, all internal nodes. You can see that 55+(1:47) gives a vector of node numbers that is not the same as 55 to 47. If it isn't working right - maybe it is that you have polytomies? I think you should have n-1 internal nodes (53 in your case). Try multi2di (from ape) for polytomy resolution and see if it works as desired. ~Dan Rabosky Kind regards, Thierry Thierry Janssens Postdoctoral researcher Delft University of Technology Bionanoscience Kavli Institute of Nanoscience Lorentzweg 1 2628LJ Delft the Netherlands Tel: +31 15 2781175 Fax:+31 15 2781202 e-mail: t.k.s.janss...@tudelft.nl -Original Message- From: Liam J. Revell [mailto:liam.rev...@umb.edu] Sent: donderdag 17 maart 2011 16:11 To: Thierry Janssens - TNW Cc: r-sig-phylo@r-project.org Subject: Re: [R-sig-phylo] reverse order plotting of newick tree/phylo object Hi Thierry, There might be a more elegant way to do this, but you can just apply the ape function rotate() to each node number of the tree (excluding tips). I.e. tr2-tree for(i in length(tr2$tip)+1:tr2$Nnode) tr2-rotate(tr2,i) plot(tr2) [rotate() may also be able to take a vector of nodes, but I was not able to get this to work.] - Liam -- Liam J. Revell University of Massachusetts Boston web: http://faculty.umb.edu/liam.revell/ email: liam.rev...@umb.edu blog: http://phytools.blogspot.com On 3/17/2011 10:51 AM, Thierry Janssens - TNW wrote: Dear R-sig-phylo, I am looking for a method to plot an unrooted tre/phylo object e in the reverse order (of the tip labels). Like all the nodes would have rotated. Any of you has an idea? Kind regards, Thierry Thierry Janssens Postdoctoral researcher Delft University of Technology Bionanoscience Kavli Institute of Nanoscience Lorentzweg 1 2628LJ Delft the Netherlands Tel: +31 15 2781175 Fax:+31 15 2781202 e-mail: t.k.s.janss...@tudelft.nlmailto:t.k.s.janss...@tudelft.nl [[alternative HTML version deleted]] ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo [[alternative HTML version deleted]] ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Re: [R-sig-phylo] Branch and Bound Maximum Parsimony (Matthew Vavrek)
Matthew. The only thing that I would add is that if you *really* want to do an exhaustive search in R, and your species number is small enough to permit an exhaustive search (i.e., =10), then it is straightforward enough to do so: require(phangorn) data-read.phyDat(file=filename,type=USER) # for binary data all.trees-allTrees(n=length(data),tip.label=names(data),rooted=FALSE) pscores-vector() for(i in 1:length(all.trees)) pscores[i]-parsimony(all.trees[[i]],data) minscore-min(pscores); mp.tree-all.trees[pscores==minscore] mp.tree will be a single MP tree or a list if there are several MP trees. Of course this will take a long time for more than a 7 or 8 species, and will not work at all for more than 10 species. It's also possible that an exhaustive search in a traditional phylogeny inference program like PAUP* might be faster if, for instance, PAUP* retains the score for parts of the tree that don't change among a set of trees - instead of recalculating it each time anew. That I don't know. - Liam -- Liam J. Revell University of Massachusetts Boston web: http://faculty.umb.edu/liam.revell/ email: liam.rev...@umb.edu blog: http://phytools.blogspot.com On 3/28/2011 10:58 AM, Ross Mounce wrote: Dear Matthew, Branch and Bound searching is often unneccessary, are you sure you need to do this? With enough random addition sequences (relative to dataset size) you *will* find all MPTs. PAUP* despite being familiar to many of us and very fully-featured, is definitely ungainly for modern analyses with it's lack of parallelisation. Have you considered using TNT? http://www.cladistics.com/aboutTNT.html Wiki Manual here: http://tnt.insectmuseum.org/index.php/Main_Page Many paleontologists are starting to use this - computationally it's far more efficient. Just remember, as with any and all phylogenetic analyses it's garbage in, garbage out: you must know what you're doing and why *before* you start your analysis. Kind Regards and good luck, Ross Mounce ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Re: [R-sig-phylo] Adding a taxon to a pre-exisiting tree
Hi Matthew. I don't doubt that other members of the list have better suggestions, but it is possible to add a tip in all possible places using bind.tree() in ape. For instance, starting with a random unrooted tree with, say, 4 taxa: tree-rtree(n=4,rooted=FALSE,br=rep(1,5)) # create a 5th species, here t5, to add as a phylo object # [I don't think this can be avoided with bind.tree()] new.tip-list(edge=matrix(c(2,1),1,2),tip.label=t5,edge.length=1,Nnode=1) class(new.tip)-phylo # add the new tip to all edges of the tree trees-list(); class(trees)-multiPhylo for(i in 1:nrow(tree$edge)) trees[[i]]-bind.tree(tree,new.tip,where=tree$edge[i,2],position=0.5) # now plot them to see what we have done plot(trees,type=unrooted,use.edge.length=F) Of course I would be happy to see a more elegant solution! Sincerely, Liam -- Liam J. Revell University of Massachusetts Boston web: http://faculty.umb.edu/liam.revell/ email: liam.rev...@umb.edu blog: http://phytools.blogspot.com On 4/1/2011 11:02 PM, Matthew Vavrek wrote: Hello again all, I got a number of great solutions to my last question about branch and bound maximum parsimony searches, several of which had definitely not crossed my mind. However, being stubborn (or is that stupid? Hard to tell sometimes) I went ahead and have started putting together a branch and bound style search function for R, just because (I'll try to get it up on CRAN shortly). It works, however it's not as efficient as I think it probably could be, probably because the method I use to add new taxa to the tree involves text searches with grep to create Newick trees. All that being said, is there any way to take an existing tree (in any format, such as Newick, but also the edge lists like in an ape 'phylo' object) and add a taxon at all the possible positions? I know allTrees() exists, but that would give me all the possibilities, rather than a restricted set (as I would need for branch and bound). Thanks again Matthew ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Re: [R-sig-phylo] Adding a taxon to a pre-exisiting tree
Of course, we could generalize my preceding suggestion with the following function: add.everywhere-function(tree,tip.name){ tree-unroot(tree) tree$edge.length-rep(1,nrow(tree$edge)) new.tip-list(edge=matrix(c(2,1),1,2),tip.label=tip.name, edge.length=1,Nnode=1) class(new.tip)-phylo # add the new tip to all edges of the tree trees-list(); class(trees)-multiPhylo for(i in 1:nrow(tree$edge)){ trees[[i]]-bind.tree(tree,new.tip,where=tree$edge[i,2], position=0.5) trees[[i]]$edge.length-NULL } return(trees) } Try it: tree-read.tree(text=((George,Paul),(Ringo,John));) trees-add.everywhere(tree,Pete_Best) plot(trees,type=unrooted) -- Liam J. Revell University of Massachusetts Boston web: http://faculty.umb.edu/liam.revell/ email: liam.rev...@umb.edu blog: http://phytools.blogspot.com On 4/1/2011 11:33 PM, Liam J. Revell wrote: Hi Matthew. I don't doubt that other members of the list have better suggestions, but it is possible to add a tip in all possible places using bind.tree() in ape. For instance, starting with a random unrooted tree with, say, 4 taxa: tree-rtree(n=4,rooted=FALSE,br=rep(1,5)) # create a 5th species, here t5, to add as a phylo object # [I don't think this can be avoided with bind.tree()] new.tip-list(edge=matrix(c(2,1),1,2),tip.label=t5,edge.length=1,Nnode=1) class(new.tip)-phylo # add the new tip to all edges of the tree trees-list(); class(trees)-multiPhylo for(i in 1:nrow(tree$edge)) trees[[i]]-bind.tree(tree,new.tip,where=tree$edge[i,2],position=0.5) # now plot them to see what we have done plot(trees,type=unrooted,use.edge.length=F) Of course I would be happy to see a more elegant solution! Sincerely, Liam ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Re: [R-sig-phylo] Converting from DNAbin to a matrix
Hi Josh, Try doing this when you read in the data: ex.dna-read.dna(exdna.txt,format=sequential,as.character=T) This will return an object of class matrix instead of a DNAbin object. The elements in the matrix should just be the letters from your input file. There may be a better way to do this, but if you already have the object in memory, you can write it to file and then read it back in using read.dna(...,as.character=T), as above. - Liam -- Liam J. Revell University of Massachusetts Boston web: http://faculty.umb.edu/liam.revell/ email: liam.rev...@umb.edu blog: http://phytools.blogspot.com On 4/7/2011 11:16 AM, Josh B wrote: Dear Listserve, I have a seemingly simple problem that I cannot figure out. I would like to convert an element of class DNAbin to a matrix. Consider the example from the read.dna help file: cat(3 40, No305 NTTCGACACACCCACTACTNTTATCAGTCACT, No304 ATTCGACACACCCACTACTATTATCAACCACT, No306 ATTCGACACACCCACTACTATTATCAATCACT, file = exdna.txt, sep = \n) ex.dna- read.dna(exdna.txt, format = sequential) This is where I am stuck: how do I create a matrix from ex.dna, where the rownames are No305, No304, and No306, and the columns are the aligned single-nucleotide sites. For example, Column 1 should have the value of N for row No305, the value of A for row No304, and the value of A for row No306, corresponding to the values of the first site in the sequence. Thank you very much for your help! --- Josh Banta, Ph.D Center for Genomics and Systems Biology New York University 100 Washington Square East New York, NY 10003 Tel: (212) 998-8465 http://plantevolutionaryecology.org [[alternative HTML version deleted]] ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
[R-sig-phylo] Possible bug in bind.tree()
This is mainly for Emmanuel, but I send to the entire list just in case someone else can identify my error first! I believe I have identified a small bug in bind.tree(). For some reason, when I bind a tree with a root.edge to a tip, the length of the edge leading to that tip is attached not only to the root.edge of the bound subtree, but also to another tip of the tree. If I have made some mistake here, my apologies(!), but any help in correcting it is also welcome of course. The easiest way to illustrate this problem is via example: # preliminaries text=(1:3.785,(8:0.13,9:0.13):0.205,6:0.335):0.154,(10:0.278,7:0.278):0.211):0.1,(4:0.386,5:0.386):0.203):2.655,(2:1.08,3:1.08):2.164):0.54); tree-read.tree(text=text) node-19 # this is the node we will extract position-1.0 # this is the length of the root.edge # extract clade and attach root edge tr1-extract.clade(tree,node=node); tr1$root.edge-1.0 # now remove tips in tr1 from tree tr2-drop.tip(tree,tr1$tip.label,trim.internal=FALSE) # subtract root.edge of tr1 from tip ending in NA tr2$edge.length[match(which(tr2$tip.label==NA),tr2$edge[,2])]-tr2$edge.length[match(which(tr2$tip.label==NA),tr2$edge[,2])]-position # now bind tr1 to tr2, where it was removed tr.bound-bind.tree(tr2,tr1,where=which(tr2$tip.label==NA)) # now plot, to visualize the result plot(tree); x11(); plot(tr.bound) Thanks. - Liam -- Liam J. Revell University of Massachusetts Boston web: http://faculty.umb.edu/liam.revell/ email: liam.rev...@umb.edu blog: http://phytools.blogspot.com ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Re: [R-sig-phylo] phylogenetically-informed Reduced Major Axis regression in R?
Hi Arne, Just calculating the slope is straightforward. For tree and column vectors x y (in order tree$tip.label): C-vcv.phylo(tree) ax-sum(solve(C,x))/sum(solve(C)) ay-sum(solve(C,y))/sum(solve(C)) beta1-sqrt(t(y-ay)%*%solve(C,y-ay)/(t(x-ax)%*%solve(C,x-ax))) The model intercept can be obtained by plotting the slope through the phylogenetic mean of x y. beta0-ay-beta1*ax I wrote a function to do this; but also to (optionally) fit Pagel's lambda jointly for x y and to return the residuals in y. I did this last year, and then fixed a few bugs/errors when I saw your query this morning. I have put it online here: http://anolis.oeb.harvard.edu/~liam/R-phylogenetics/phy.RMA/v0.1/phyl.RMA.R. Note, that this *does not* test hypotheses about the RMA regression nor does it incorporate individual variation (as in the Ives et al. 2007 article). Adding the former would be fairly simple using equations 15 of Ives et al., I think (I will do this when I get a chance); adding the latter not so much. I believe Tony Ives may have MATLAB code that does this already. Maybe he can comment. I hope this is somewhat helpful. Sincerely, Liam -- Liam J. Revell University of Massachusetts Boston web: http://faculty.umb.edu/liam.revell/ email: liam.rev...@umb.edu blog: http://phytools.blogspot.com On 4/20/2011 2:27 AM, Arne Mooers wrote: Dear members: Does anyone know of scripts to both estimate and test phylogenetically-corrected RMA regression slopes (perhaps using the relevant equations from Ives et al. (Syst. Biol. 2007))? Cheers, Arne Mooers __ Dr. Arne Mooers Biological Sciences, Simon Fraser University Burnaby, BC Canada V5A 1S6 tel. +1 778 782 3979 skype: arnemooers http://www.sfu.ca/~amooers http://www.sfu.ca/fabstar http://www.scientists-4-species.org ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Re: [R-sig-phylo] Convert filogenetic tree to binary matrix
Hi Vanderlei, Try this which uses ape function prop.part(): temp-prop.part(tree) X-matrix(0,nrow=length(tree$tip),ncol=length(temp),dimnames=list(tree$tip.label,tree$node.label)) for(i in 1:ncol(X)) X[temp[[i]],i]-1 - Liam -- Liam J. Revell University of Massachusetts Boston web: http://faculty.umb.edu/liam.revell/ email: liam.rev...@umb.edu blog: http://phytools.blogspot.com On 4/28/2011 2:21 PM, Vanderlei Debastiani wrote: Good morning! I need to create a binary matrix with all node of a phylogenetic tree and the presence of each taxo in their respective node. Example: require(ape) y-read.tree(text=(E,((H,I)D,(F,G)C)B)A;) y plot(y, show.node=TRUE) I need to create a binary matrix as follows: AB CD G 1 1 1 0 F 1 1 1 0 I 1 1 0 1 H 1 1 0 1 E 1 0 0 0 Somebody could help me to solve this problem. Thanks, Vanderlei Debastiani Laboratório de Ecologia Filogenética e Funcional Programa de Pós-Graduação em Ecologia Universidade Federal do Rio Grande do Sul www.ufrgs.br/leff ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Re: [R-sig-phylo] Generating a tree based on a distance matrix?
You can get the least squares branch lengths. I believe that if the distance matrix is a patristic distance matrix from your tree, then the least squares branch lengths are guaranteed to be the same as your original branches (and you should have a sum of squares error, Q, of zero - to numerical precision). To test this, you can just use my optim.phylo.ls() function, and set the input tree to your target tree. You will need to have phangorn and ape to run this: source(http://anolis.oeb.harvard.edu/~liam/R-phylogenetics/optim.phylo.ls/v0.3/optim.phylo.ls.R;) # load the source LStree-optim.phylo.ls(D,stree=tree) Note that LStree will be unrooted even if tree is rooted. To test this, try the following: tree-rtree(5,rooted=F) # random unrooted tree tree$edge.length # show edge lengths [1] 0.005683385 0.516492136 0.761613776 0.714302898 0.071461088 0.217680734 0.393926894 D-cophenetic(tree) # compute distance matrix tree$edge.length-NULL # delete branch lengths test-optim.phylo.ls(D,tree) # compute LS tree with true tree as input best Q score of 4.19082355898663e-30 found after 0 nearest neighbor interchange(s). test$edge.length [,1] 6,7 0.005683385 7,1 0.516492136 7,8 0.761613776 8,2 0.714302898 8,3 0.071461088 6,4 0.217680734 6,5 0.393926894 You can easily see that they are the same branch lengths as in our original tree. I hope this is helpful. Sincerely, Liam -- Liam J. Revell University of Massachusetts Boston web: http://faculty.umb.edu/liam.revell/ email: liam.rev...@umb.edu blog: http://phytools.blogspot.com On 5/12/2011 5:18 PM, mgavil2 wrote: All, I have a tree topology (tree_name.tre), and a distance matrix, based on that tree topology. However I cant not seem to find the nexus file from which the matrix was generated. Is there a way to use that distance matrix to incorporate branch lengths into my topology? I have looked into all the threads of questions posted in the list, but still can not find an answer. my final objective is just to generate a tree with branch lengths proportional to the distances on the matrix (reviewers requirement for a publication). Any suggestions would be greatly appreciated! Thanks. Maria Mercedes ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Re: [R-sig-phylo] ACE - ML
Sorry, I didn't see that this had already been addressed. -- Liam J. Revell University of Massachusetts Boston web: http://faculty.umb.edu/liam.revell/ email: liam.rev...@umb.edu blog: http://phytools.blogspot.com On 5/17/2011 12:39 PM, David Bapst wrote: Alanna- It's because when multi2di() resolves the polytomies, it puts in zero-length edges. This causes the phylogenetic variance-covariance matrix calculated from your tree to be singular. One solution would be to add a small arbitrary constant to your zero-length edges, although I would caution you to try several different values and see how your results differ. -Dave Bapst, UChicago On Tue, May 17, 2011 at 10:23 AM, Alanna Maltbyalanna.mal...@ioz.ac.ukwrote: Dear all I've been trying to use Maximum Likelihood to estimate ancestral characters using ace in APE, but I get an error: ?Error in solve.default(out$hessian) : Lapack routine dgesv: system is exactly singular? My code looks like this: tree-read.nexus(tree.nex) data-read.csv(data.csv) BW-data.frame(data[,c(2,3)]) rownames(BW)-data[,2] na.omit(BW)-BW name.check(tree,BW)-overlap drop.tip(tree,overlap$Tree.not.data)-tree2 tree3-multi2di(tree2) BW2-BW[tree3$tip.label,] BW3-data.frame(BW2[,2]) as.matrix(BW3)-BWM as.vector(BWM)-BWV names(BWV)-BW2[,1] asrBW-ace(BWV, tree3, type=continuous) asrBW$ace which is possibly not super-elegant, but works when I use pic instead of ML. I notice that someone brought up the same problem last year but there were no solutions. Any ideas? Many thanks, Alanna - Alanna Maltby PhD Student - Evolution of Echolocation in Bats University College London and Institute of Zoology Tel: +44 (0) 20 7449 6322 Website: www.zsl.org/alannamaltbyhttp://www.zsl.org/alannamaltby The Zoological Society of London is incorporated by Royal Charter Principal Office England. Company Number RC000749 Registered address: Regent's Park, London, England NW1 4RY Registered Charity in England and Wales no. 208728 _ This e-mail has been sent in confidence to the named a...{{dropped:21}} ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Re: [R-sig-phylo] ACE - ML
Hi Alanna, You can get this kind of error if your tree has tip edges with zero length. Then C-vcv.phylo(tree) will be exactly singular. Could this be true of your tree? Best, Liam -- Liam J. Revell University of Massachusetts Boston web: http://faculty.umb.edu/liam.revell/ email: liam.rev...@umb.edu blog: http://phytools.blogspot.com On 5/17/2011 11:23 AM, Alanna Maltby wrote: Dear all I've been trying to use Maximum Likelihood to estimate ancestral characters using ace in APE, but I get an error: ?Error in solve.default(out$hessian) : Lapack routine dgesv: system is exactly singular? My code looks like this: tree-read.nexus(tree.nex) data-read.csv(data.csv) BW-data.frame(data[,c(2,3)]) rownames(BW)-data[,2] na.omit(BW)-BW name.check(tree,BW)-overlap drop.tip(tree,overlap$Tree.not.data)-tree2 tree3-multi2di(tree2) BW2-BW[tree3$tip.label,] BW3-data.frame(BW2[,2]) as.matrix(BW3)-BWM as.vector(BWM)-BWV names(BWV)-BW2[,1] asrBW-ace(BWV, tree3, type=continuous) asrBW$ace which is possibly not super-elegant, but works when I use pic instead of ML. I notice that someone brought up the same problem last year but there were no solutions. Any ideas? Many thanks, Alanna - Alanna Maltby PhD Student - Evolution of Echolocation in Bats University College London and Institute of Zoology Tel: +44 (0) 20 7449 6322 Website: www.zsl.org/alannamaltbyhttp://www.zsl.org/alannamaltby The Zoological Society of London is incorporated by Royal Charter Principal Office England. Company Number RC000749 Registered address: Regent's Park, London, England NW1 4RY Registered Charity in England and Wales no. 208728 _ This e-mail has been sent in confidence to the named add...{{dropped:20}} ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Re: [R-sig-phylo] fitContinuous in geiger
Hi Annemarie, Positive log-likelihoods are not a problem. The log-likelihood is calculated by summing the log probability densities, which come from a function that integrates to 1.0. Thus, if the variance of this distribution is small, the value of the function will be large (i.e., greater than 1.0). The phenomenon of decreasing mean lambda when you increase the scale (i.e., multiply by 10 or 100) is probably due to bounds on beta (in the lambda model, sigma^2) in fitContinuous(). The default upper bound is 20. You can change this by executing: fitContinuous(...,bounds=list(beta=c(0,1000)) # or something once you fix this issue, the mean lambda for any scale of your data vector should be the same. You can also try my function for estimating phylogenetic signal: phylosig(...,method=lambda) at URL: http://anolis.oeb.harvard.edu/~liam/R-phylogenetics/ which does not have this issue. Good luck. - Liam -- Liam J. Revell University of Massachusetts Boston web: http://faculty.umb.edu/liam.revell/ email: liam.rev...@umb.edu blog: http://phytools.blogspot.com On 5/18/2011 3:50 AM, Annemarie Verkerk wrote: Hi all, I’m having some trouble with the function fitContinuous in the geiger library. I'm using fitContinuous to estimate a lambda score as an indication for the presence of phylogenetic signal. As a sidenote, I'm doing this with language data - so language trees based on shared vocabulary and it is a linguistic typological trait that I'm trying to get estimates of lambda of. Another sidenote is that I have similar problems in BayesTraits but no problems using phylosignal in picante for estimating lambda. At the moment, there are 14 taxa in my sample. I have a tree set of 1000 trees. The first data set + trees are attached. My data values are all values between 0 and 1, basically things like '0.326547'. (This is because they come from a principal components analysis; they are scores on the first principal component that explains about 80% of the variation.) I've been using capped values with two numbers after the period just for easy usage, so '0.33'. However, the results that I get are strange. My first dataset looks like this: language value t1 0.32 t4 0.52 t6 0.95 t9 0.75 t10 0.77 t12 0.46 t14 0.61 t2 0.35 t3 0.29 t5 0.25 t7 0.89 t8 0.88 t11 0.79 t13 0.35 Then I do the fitContinuous analysis over my sample of trees (1000 trees) and these are my scores: median of lambda: [1] 1 mean of lambda: [1] 0.985 sd of lambda [1] 4.60849e-05 So: almost all values of lambda are 1. median of log-likelyhood [1] 5.206887 mean of log-likelyhood [1] 5.210839 sd of log-likelyhood [1] 0.4215943 The log-likelyhood is positive? That is very strange…? These results basically make it seem as if the algorithm has crashed. Then, I multiply my values with 100: language value t1 32 t4 52 t6 95 t9 75 t10 77 t12 46 t14 61 t2 35 t3 29 t5 25 t7 89 t8 88 t11 79 t13 35 results: median lambda: [1] 0.9874361 mean lambda: [1] 0.9839095 sd lambda: [1] 0.01622255 median log-likelihood: [1] -65.66331 mean log-likelihood: [1] -65.73675 sd log-likelihood: [1] 1.716778 Now the number of lambda scores of '1' is lower, although it is not really gone yet, there are still around a 200-300 instances of '1'. The log-likelyhood is now -65, so at least it's negative. When I multiply my original data points with 1000, this is my data set: value language value t1 320 t4 520 t6 950 t9 750 t10 770 t12 460 t14 610 t2 350 t3 290 t5 250 t7 890 t8 880 t11 790 t13 350 results: median lambda: [1] 0.8640076 mean lambda: [1] 0.8561964 sd lambda: [1] 0.05001523 median log-likelihood: [1] -2055.763 mean log-likelihood [1] -2067.052 sd log-likelihood [1] 213.44 There are no no more lambda scores of ‘1’ in the data, but the log likelood is a really big number, and I'm not sure what that would mean in this context? So, even though the range of variation stays exactly the same with these multiplications, there are quite important differences between the results these three sets of data give me. It was suggested to me that the algorithm might be doing something to my data values, for instance cap them, round them off or not taking into account certain decimals, and that might be the reason for these different results. Would anyone have any idea about why this happens and how I can deal with it in an informative way? Thanks so much for any help that you might be able to offer, Annemarie Verkerk ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Re: [R-sig-phylo] fitContinuous in geiger
Hi Annemarie, The only thing I would add to Carl's comment is that the theoretical limit of lambda is not 1.0, but can be found (for an ultrametric tree) by computing: C-vcv.phylo(tree) maxLambda-max(C)/max(C[upper.tri(C)]) You can then change the boundary condition for fitContinuous(): fit-fitContinuous(phy=tree,x,model=lambda,bounds= list(alpha=c(0,maxLambda))) My function phylosig() automatically finds the upper boundary condition and uses this for the optimization: source(http://anolis.oeb.harvard.edu/~liam/R-phylogenetics/phylosig/v0.3/phylosig.R;) fit-phylosig(tree,x,method=lambda,test=TRUE) Best of luck. - Liam -- Liam J. Revell University of Massachusetts Boston web: http://faculty.umb.edu/liam.revell/ email: liam.rev...@umb.edu blog: http://phytools.blogspot.com On 5/23/2011 2:46 PM, Carl Boettiger wrote: Hi Annemarie, No problem, tried to give some answers below. On Mon, May 23, 2011 at 8:05 AM, Annemarie Verkerk annemarie.verk...@mpi.nl mailto:annemarie.verk...@mpi.nl wrote: Dear Carl, Liam, and others, thanks for your explanation of what went wrong in the fitContinuous calculations. I set beta to a large number (100) in order to stop it from reading the maximum value. Then, I got exactly the same results for lambda with the non-multiplied and the multiplied data. Okay, I still have two more problems - probably they will sound stupid to you, but I am still very much a newbie to this and if it suffices just to point me to other sources where this has already been discussed please do so! the log likelyhood: between the non-multiplied and the multiplied data, there is still a difference. Liam, you write 'if the variance of this distribution is small, the value of the function will be large (i.e., greater than 1.0)'. However, the variance between the non-multiplied and the multiplied data is exactly the same. So why should log likelyhood values change when data is multiplied? Why do I get a normal looking value (around -200) (with the beta scale set for a very large scale) for the multiplied data? And why does the log likelyhood become so big (-2000) when the initial maximum beta value of 20 is reached if the scale is (0,20)? Liam is referring to the variance of the likelihood distribution, not the variance of the traits. If the diversification rate is high, then any particular outcome is very improbable, so its probability density has high variance and corresponding low value for the prob density at any particular value. Hence the large negative log-likelihood for large beta. (Recall log lik around -2000 means a density of exp(-2000), which is very near zero, illustrating why we need to use logs!) the lambda scores: now, my lambda scores for the non-multiplied and the multiplied data are the same. However, most of them (999 out of 1000) are '1'. To my understanding, '0' and '1' are theoretical limits for lambda that one normally does not reach. So I'm afraid I'm still unsure why this happens, and what it means. You raise an important point here. Just because they are the boundary values, does not imply that these theoretical limits are uncommon -- in fact one may expect to hit the limits more often than any other value. Numerical optimization will often push estimates of a parameter all the way to its boundary. This just means that increasing (deceasing) the parameter value increases the likelihood, so it keeps doing so until it can go no further. This can result from, but does not necessarily imply, that the model is inappropriate (it is also particularly common for small numbers of taxa, for instance), and can be more common on relatively flat likelihood surfaces. So I guess my short answer is this isn't a problem and my longer answer is be suspicious whenever you get back boundary estimates, and consider bootstrapping. HTH, Carl I hope you don't mind this new response to the thread. With kind regards, Annemarie Liam J. Revell wrote: Hi Annemarie, Positive log-likelihoods are not a problem. The log-likelihood is calculated by summing the log probability densities, which come from a function that integrates to 1.0. Thus, if the variance of this distribution is small, the value of the function will be large (i.e., greater than 1.0). The phenomenon of decreasing mean lambda when you increase the scale (i.e., multiply by 10 or 100) is probably due to bounds on beta (in the lambda model, sigma^2) in fitContinuous(). The default upper bound is 20. You can change this by executing: fitContinuous(...,bounds=list(beta=c(0,1000)) # or something once you fix this issue, the mean lambda for any scale of your data vector should be the same. You can also try my function for estimating phylogenetic signal
Re: [R-sig-phylo] ancestral state reconstruction with fixed internal node(s)
Hi Annemarie, I do think this makes sense - *if* (and this may be a big if, depending on your data) we have a value for a hypothetical ancestor for which we are very confident and which we are sure belongs at a particularly ancestral node. In that case, you could indeed use ace() by assigning attaching a zero length tip edge to the internal node of interest. To illustrate this using an example, let's do the following: # first generate a pure-birth tree using birthdeath.tree in geiger tree-drop.tip(birthdeath.tree(b=1,d=0,taxa.stop=21),21) # now plot the tree with internal node numbers plot(tree); nodelabels(length(tree$tip)+1:tree$Nnode) # now let's simulate on the tree using fastBM() from my phytools package # (not on CRAN, but http://anolis.oeb.harvard.edu/~liam/R-phylogenetics) # in this case we know the ancestral states if we set internal=TRUE x-fastBM(tree,internal=T) y-x[1:length(tree$tip)] # only the tip states # now, let's say we want to attach the tip to node 22 in this case tip-list(edge=matrix(c(2L,1L),1,2),tip.label=22,edge.length=0,Nnode=1L) class(tip)-phylo atree-multi2di(bind.tree(tree,tip,where=22)) z-c(y,x[22]) # now let's estimate ancestral states - we can do this using # method=gls res.gls-ace(z,atree,method=GLS,corStruct=corBrownian(1,atree)) Keep in mind that the node numbers in tree and atree are different (because the latter has a different number of tips) *and* we have a different number of them (because we used multi2di() to fully resolve our multifurcating tree after we attached the extra tip). Hope this helps. - Liam -- Liam J. Revell University of Massachusetts Boston web: http://faculty.umb.edu/liam.revell/ email: liam.rev...@umb.edu blog: http://phytools.blogspot.com On 6/30/2011 12:40 PM, ppi...@uniroma3.it wrote: Hi AnneMarie, I dont really know if this makes sense; in fact ancestral state reconstruction is an ** a posteriori estimation ** of nodal values starting from tips observations. A trick could be to add a false taxon lnked to that node and giving to it a 0 branch length (i.e. plitomized - you can then resolve this during computation using multi2di() ) and assigning your desired trait value. I'm not sure if this helps Best Paolo Dear phylo-sig list people, I want to do ancestral state reconstruction (preferably with ace) with one (or more) of the internal nodes 'fixed' for a range / a distribution of values. For instance, I want a node leading to one particular clade that is present a subset of my trees to have a value from 0.2-0.5, and then do the ancestral state reconstruction with this restriction on that node. I have been searching the archives and the net for discussion of this (but maybe with the wrong search terms), and I cannot find anything about it - not how to do it in R or in any other package. Thanks for your input! Annemarie -- Annemarie Verkerk, MA Evolutionary Processes in Language and Culture (PhD student) Max Planck Institute for Psycholinguistics P.O. Box 310, 6500AH Nijmegen, The Netherlands +31 (0)24 3521 185 http://www.mpi.nl/research/research-projects/evolutionary-processes ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Re: [R-sig-phylo] a question about pic function(ape package)
Hi. This is because your tree is not fully dichotomous. tree-read.nexus(file=tree.nex) is.binary.tree(tree) [1] FALSE You can resolve multifurcations with branches of zero length using multi2di(), e.g.: tree-multi2di(tree) is.binary.tree(tree) [1] TRUE Hope this helps. - Liam -- Liam J. Revell University of Massachusetts Boston web: http://faculty.umb.edu/liam.revell/ email: liam.rev...@umb.edu blog: http://phytools.blogspot.com On 7/11/2011 10:44 PM, wrote: dear all, I have a question when I used the pic funtion, it appeared an error:'phy' is not rooted and fully dichotomous, I don't understand what's the problem, the attached file is my phylogenetic information, please check it, I am sorry to trouble you all, but I really want to solve this problem in my study, thanks a lot. best wishes to you all! ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Re: [R-sig-phylo] pic() vs gls()
Hi Nick. For your first (simple) problem, I believe you want to do: read.csv(file=Mason_data.csv,row.names=1)-smdata Regarding the more complicated issue, the problem of non-independence in linear regression comes in the residual error of the model. Thus, you should fit a correlation structure for the error in Y given X (not for X Y separately as you have done with PICs here). This indeed can be done using gls() - so, for instance: pglsModel1a-gls(Pause_Rate~Comp.1, correlation=corMartins(1, spbm),data=smdata) fits a linear model in which the residual error of Pause_Rate given Comp.1 is distributed according to the correlation structure corMartins() with alpha estimated using REML. The way to reproduce this result using contrasts is the following: alpha-attr(pglsModel1a$apVar,Pars)[corStruct] # get fitted alpha smdata-as.matrix(smdata) # important pr_contrast-pic(smdata[,1],ouTree(spbm,alpha=alpha)) pc1_contrast-pic(smdata[,2],ouTree(spbm,alpha=alpha)) summary(lm(pr_contrast~pc1_contrast-1)) Which if you compare to: summary(pglsModel1a) you should find yields the same results and P-value. (Note that smdata-as.matrix(smdata) seems to be important here as if smdata is a data frame, then smdata[,1] does not inherit the rownames of smdata and pic() will return an incorrect set of contrasts.) I hope this helps. No doubt other subscribers to the list may have something to add. Best, Liam -- Liam J. Revell University of Massachusetts Boston web: http://faculty.umb.edu/liam.revell/ email: liam.rev...@umb.edu blog: http://phytools.blogspot.com On 7/13/2011 11:07 PM, Nicholas Mason wrote: Dear R-sig-phylo users, I have a question regarding comparative analyses of contrasts done with the functions fitContinuous() and pic() compared to using PGLS (using the gls() function). From my understanding the first method involving pic() below fits alpha (estimated using fitContinuous()) to each character independently then performs a regression on the resulting contrasts. The second (PGLS) method involving gls() fits both the regression and contrasts simultaneously and reports a single alpha value for the relationship between two traits. These two methods appear very similar, yet I get slightly different results between the two functions, particularly with respect to p-values. Using fitContinuous(), the results from the data set attached are r2 = 0.075 p = 0.091. Using gls(): r2 = 0.079 p =0.0519. Furthermore, if I switch the dependent and independent variables (V1~V2 vs V2~V1), I get the same answer with pic(), but gls() differs: r2 = -0.069 p =0.02 (see pglsModel1a vs pglsModel1b in the attachment). Could anyone explain why these functions vary (in my mind they were doing essentially the same thing) and if there are situations where one should be favored over another? Also, why does PGLS vary if you switch the dependent/independent variables in the linear model? Thanks in advance for any advice or comments offered!! Cheers, Nick Mason Below I have the code I have been using (also attached as Mason.R): require(ape) require(nlme) require(geiger) read.csv(file=Mason_data.csv)-smdata rownames(smdata)-smdata[,1] smdata[,1]-NULL#ASIDE: If anyone could tell me how to get around these two lines of code, that would also be awesome. Header=T doesn't seem to work for me. read.tree(file=Mason_tree.tre)-spbm name.check(smdata,spbm) fitContinuous(spbm,smdata,model=OU)-ou2 pr_contrast-pic(smdata[,1],ouTree(spbm,alpha=ou2$Pause_Rate$alpha)) pc1_contrast-pic(smdata[,2],ouTree(spbm,alpha=ou2$Comp.1$alpha)) summary(lm(pr_contrast~pc1_contrast-1)) pglsModel1a-gls(Pause_Rate~Comp.1, correlation=corMartins(1, spbm),data=smdata) summary(pglsModel1a) pglsModel1b-gls(Comp.1~Pause_Rate, correlation=corMartins(1, spbm),data=smdata) summary(pglsModel1b) - Nicholas Albert Mason MS Candidate, Burns Lab Department of Biology - EB Program San Diego State University 5500 Campanile Drive San Diego, CA 92182-4614 845-240-0649 (cell) ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Re: [R-sig-phylo] follow up for lambda transform - query: how to use an existing covariance matrix directly in a gls procedure in R?
Hi Luke. For a fixed value of lambda this is very easy. For this example, you would just do: library(ape) data(bird.orders) dat-data.frame(y=rnorm(23),x=rnorm(23)) rownames(dat)-bird.orders$tip.label mat-vcv(bird.orders,cor=TRUE) # following Blomberg to here lambda-0.75 # for instance fit1-gls(y~x,correlation=corSymm(lambda*mat[lower.tri(mat)],fixed=TRUE),data=dat) # compare to corPagel fit2-gls(y~x,correlation=corPagel(0.75,bird.orders,fixed=TRUE),data=dat) summary(fit1) summary(fit2) In addition, a trick to get the full correlation/covariance matrix after lambda transformation (given a set value of lambda) is just: mat.lambda-lambda*(mat-diag(diag(mat)))+diag(diag(mat)) I'm not sure whether or not this is exactly what you're looking for, but I hope it helps. Best, Liam -- Liam J. Revell University of Massachusetts Boston web: http://faculty.umb.edu/liam.revell/ email: liam.rev...@umb.edu blog: http://phytools.blogspot.com On 7/15/2011 4:52 PM, Luke Matthews wrote: This question picks up on a thread from last week, copied below. I've also been able to use the approach Simon presented so concisely with gene networks that are not strictly bifurcating. This only allows for the Brownian model; that is, no transforms of the entries supplied in the covariance matrix. I'd like to apply a lambda-type transform to the covariance or correlation matrix directly. Does anyone know how to co-opt corPagel for use directly with the covariance matrix, or else an alternative implementation in an nlme correlation structure? Thanks Luke Matthews [R-sig-phylo] query: how to use an existing covariance matrix directly in a gls procedure in R? Simon Blomberg s.blomberg1 at uq.edu.aumailto:r-sig-phylo%40r-project.org?Subject=Re%3A%20%5BR-sig-phylo%5D%20query%3A%20how%20to%20use%20an%20existing%20covariance%20matrix%0A%20directly%20in%20a%20gls%20procedure%20in%20R%3FIn-Reply-To=%3C011EDAAAEA824D44AE3EC8D2562AB98402DDFE279F%40UQEXMB06.soe.uq.edu.au%3E Fri Jul 8 01:44:48 CEST 2011 * Previous message: [R-sig-phylo] query: how to use an existing covariance matrix directly in a gls procedure in R?https://stat.ethz.ch/pipermail/r-sig-phylo/2011-July/001474.html * Next message: [R-sig-phylo] query: how to use an existing covariance matrix directly in a gls procedure in R?https://stat.ethz.ch/pipermail/r-sig-phylo/2011-July/001477.html * Messages sorted by: [ date ]https://stat.ethz.ch/pipermail/r-sig-phylo/2011-July/date.html#1475 [ thread ]https://stat.ethz.ch/pipermail/r-sig-phylo/2011-July/thread.html#1475 [ subject ]https://stat.ethz.ch/pipermail/r-sig-phylo/2011-July/subject.html#1475 [ author ]https://stat.ethz.ch/pipermail/r-sig-phylo/2011-July/author.html#1475 I used to do this before ape existed. Here is some example code. library(ape) data(bird.orders) # some made up data dat- data.frame(y=rnorm(23), x=rnorm(23)) rownames(dat)- bird.orders$tip.label mat- vcv(bird.orders, cor=TRUE) fit- gls(y~x, correlation=corSymm(mat[lower.tri(mat)], fixed=TRUE), data=dat) #assuming ultrametric tree # compare with corBrownian structure: fit2- gls(y~x, correlation=corBrownian(phy=bird.orders), data=dat) # are the regression coefficients the same? all.equal(coef(fit), coef(fit2)) [1] TRUE I hope this helps, Simon. From: r-sig-phylo-bounces at r-project.orghttps://stat.ethz.ch/mailman/listinfo/r-sig-phylo [r-sig-phylo-bounces at r-project.orghttps://stat.ethz.ch/mailman/listinfo/r-sig-phylo] On Behalf Of Tomlinson, Kyle [kyle.tomlinson at wur.nlhttps://stat.ethz.ch/mailman/listinfo/r-sig-phylo] Sent: Friday, July 08, 2011 5:30 AM To: 'r-sig-phylo at r-project.orghttps://stat.ethz.ch/mailman/listinfo/r-sig-phylo' Subject: [R-sig-phylo] query: how to use an existing covariance matrix directly in a gls procedure in R? Dear Dr Paradis I trust that I am allowed to contact you directly like this? If not, my sincere apologies. I have constructed the covariance matrix for a phylogenetic tree (using the vcv() command), which i would like to use directly in a gls() procedure, instead of using the tree directly as appears to be done in ape (because I only use a very small part of the tree i have constructed..). {I am doing this in order to check my own gls procedure and the gls procedure of PHYLOGr which I dont trust.} Do you know if this can be done? I have tried to figure it out by considering the corStruct class options in nlme, but none of these options seems to allow one to directly input an existing covariance matrix, and I simply dont understand the object construction methods of R well enough to build a suitable corStruct object myself. I hope you can help. sincerely Kyle Tomlinson Resource Ecology Group Centre for Ecosystem Studies Wageningen University Droevendaalsesteeg 3a 6708 PB Wageningen The Netherlands Phone: +31 317
Re: [R-sig-phylo] LTT plot non-ultrametric trees
This is implemented in my phytools package. This is not yet on CRAN, but can be downloaded from http://anolis.oeb.harvard.edu/~liam/R-phylogenetics/. The function is ltt(). It is slow, but seems to work. Please let me know if you have success with this. - Liam -- Liam J. Revell University of Massachusetts Boston web: http://faculty.umb.edu/liam.revell/ email: liam.rev...@umb.edu blog: http://phytools.blogspot.com On 8/10/2011 8:51 PM, Rob Lanfear wrote: Hi All, I have a set of simulated birth-death trees, which include extinct and extant lineages. I'd like to see if my simulations are doing what I think they should be, by plotting out the number of lineages through time. Does anyone know of a method of doing this that incorporates extinct lineages, i.e. one where the number of lineages can decrease as well as increase through time. I can see how it could be done (I think...) by making an ordered data frame of node times (+1 lineage) and extinction events (-1 lineage), but it seems like somebody might already have done it. Cheers, Rob ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Re: [R-sig-phylo] LTT plot non-ultrametric trees
Hi Rob. Thanks for identifying the bug. This should be fixed, I hope, in the newest version of phytools (v0.0-6; available: http://anolis.oeb.harvard.edu/~liam/R-phylogenetics/). The way this function works is by first computing the heights above the root of all the nodes (including tip nodes) in the tree, and then at each event (that is, a speciation or extinction), it counts the number of edges that include that time point. This is slow. It seems likely that Emmanuel's suggestion (in a previous email) to use dist.nodes() would run much faster than this. - Liam -- Liam J. Revell University of Massachusetts Boston web: http://faculty.umb.edu/liam.revell/ email: liam.rev...@umb.edu blog: http://phytools.blogspot.com On 8/11/2011 12:42 AM, Rob Lanfear wrote: Hi Liam, Thanks for the help. Extremely useful. And thanks for clearing up my lack of understanding about the differences in maximum ages! Cheers, Rob On 11 August 2011 14:05, Liam J. Revell liam.rev...@umb.edu mailto:liam.rev...@umb.edu wrote: Hi Rob. I can reproduce your error, but I haven't figured out the problem yet. You can try an earlier version of this function, which seems to work: source(http://anolis.oeb.__harvard.edu/~liam/R-__phylogenetics/ltt/v0.3/ltt.R http://anolis.oeb.harvard.edu/~liam/R-phylogenetics/ltt/v0.3/ltt.R) p2 - ltt(t1, log.lineages=FALSE, drop.extinct=FALSE) Sorry about this. Also: max(p1$times)==max(p2$times) can be FALSE because if drop.extinct is set to TRUE, then the crown age of the pruned tree can be smaller than in the full tree if some lineages arising at the root of the tree do not leave any extant descendants. - Liam -- Liam J. Revell University of Massachusetts Boston web: http://faculty.umb.edu/liam.__revell/ http://faculty.umb.edu/liam.revell/ email: liam.rev...@umb.edu mailto:liam.rev...@umb.edu blog: http://phytools.blogspot.com On 8/10/2011 9:50 PM, Rob Lanfear wrote: library(TreeSim) library(phytools) library(geiger) #simulate tree of 100 taxa with initial diversification followed by a period of B=D t1 - sim.rateshift.taxa(100, 1, c(0.2, 0.2), c(0.2, 0.05), c(1,1), c(0, 20))[[1]] t1 #confirm number of extant taxa is 100 n.extant- length(prune.extinct.taxa(t1)$__tip.label) n.extant #do ltt plot without extinct taxa (works fine) p1- ltt(t1, log.lineages=FALSE, drop.extinct=TRUE) #do ltt plot with extinct taxa (looks odd) p2- ltt(t1, log.lineages=FALSE, drop.extinct=FALSE) #max times don't seem to correspond between the two plots. max(p1$times)==max(p2$times) -- Rob Lanfear Postdoc, Centre for Macroevolution and Macroecology, Research School of Biology, Australian National University Tel: +61 2 6125 7270 www.robertlanfear.com http://www.robertlanfear.com ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Re: [R-sig-phylo] Interpreting rate.evol.mcmc output
Hi Roland. Thanks for your interest in this function and method. The method is described in an article that is now available as an Accepted Article online (http://dx.doi.org/10./j.1558-5646.2011.01435.x). Unfortunately, this was posted online by 'Evolution' without its supplementary appendix (which contains details on running and interpreting the function evol.rate.mcmc()), but I would be happy to send this to you off-list. evol.rate.mcmc() implements a method in which MCMC is used to identify the phylogenetic position of a shift in the evolutionary rate for a continuously valued character. It can also be used to obtain estimates of the evolutionary rates tipward and rootward of this shift. What you are looking at is the raw posterior sample from this analysis. Depending on how long you run the MCMC you should have 100s or 1000s of rows in $mcmc 100s or 1000s of elements in $tips. To make sense of this, you should use two different functions: minSplit() and posterior.evolrate(). minSplit() takes the tree and $mcmc as input. You should first decide how many generations (and thus how many rows of $mcmc, which will depend on sampling frequency) you want to exclude as burn-in. It returns the shift-point that minimizes the summed or the sum of squared distances to all the other sampled points in the posterior. posterior.evolrate() pre-processes the posterior sample so that you can make sense of it. It takes as input the tree, the average shift point (say, the output from minSplit()), $mcmc, and $tips. It returns a matrix containing a pre-processed posterior sample of evolutionary rates tipward and rootward of the median shift point which can then be analyzed in a typical way. (For instance, you can compute parameter estimates as the mean or median; as well as HPD intervals and ESSs using, for instance, the MCMC diagnostics package coda.) I hope this is helpful. Thanks again for giving the method a try. Please don't hesitate to contact me for further clarification. Best of luck. - Liam -- Liam J. Revell University of Massachusetts Boston web: http://faculty.umb.edu/liam.revell/ email: liam.rev...@umb.edu blog: http://phytools.blogspot.com On 8/12/2011 11:07 AM, Roland Sookias wrote: Hi Can anyone help me interpret the output from phytools' rate.evol.mcmc function? I get, as it says I should, results from the MCMC run and a tips list of stips in state sig(1)^2 for each sampled generation of MCMC (to polarize the rate shift)., but I'm not sure what I'm looking for practically to see if/when there have been rate shifts. The mcmc list is like this: 2 200 0.004979113 0.002156864 1.889112 131 3.6386714 -38.68685 3 300 0.008243820 0.006182640 1.964227 97 19.7978315 -36.52512 And the tips like this: $tips[[5]] [1] Effigia_okeeffeae Cheers Roland ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Re: [R-sig-phylo] Blomberg's K and assumptions regarding independent contrasts
Yes, this would be circular. For instance, consider the situation in which the data have no signal and thus the best branch-length transformation (perhaps) would shorten all internal edges to 0, and lengthen all terminal edges to a constant. K is guaranteed to be 1.0 (regardless of the data) on this tree (and any star tree). I hope this is helpful. Sorry for the slow reply. All the best, Liam -- Liam J. Revell University of Massachusetts Boston web: http://faculty.umb.edu/liam.revell/ email: liam.rev...@umb.edu blog: http://phytools.blogspot.com On 8/12/2011 1:35 AM, Manabu Sakamoto wrote: Hi everyone, When performing Blomberg et al.'s test for phylogenetic signals, do the data and/or branch lengths have to conform to the assumptions of Brownian motion as discussed in Garland et al. (1992) and elsewhere, i.e. do contrasts have to be adequately standardised? Or can data and branch lengths be the original untransformed values? I have the vague sense that it would be circular to make sure the data/branch lengths conform to BM and then test for deviations from BM by computing K. Or am I wrong to think this way? many thanks, Manabu Manabu Sakamoto, PhD Postdoctoral Research Associate School of Earth Sciences University of Bristol Bristol, UK, BS8 1RJ Tel: +44 (0) 117 954 5421 Fax: +44 (0)117 925 3385 Email: m.sakam...@bristol.ac.uk ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Re: [R-sig-phylo] Maximum Parsimony branch lengths
Hi Robin. Yes, you can do this under the ACCTRAN (accelerated transformation) criterion using the function acctran() in phangorn. To do this given a data file sequences.dna in FASTA format: dna-read.phyDat(sequences.dna,format=fasta) tree-pratchet(dna) # or you could use optim.parsimony() tree-acctran(tree,dna) I hope this helps. - Liam -- Liam J. Revell University of Massachusetts Boston web: http://faculty.umb.edu/liam.revell/ email: liam.rev...@umb.edu blog: http://phytools.blogspot.com On 9/2/2011 8:35 AM, Velzen, Robin van wrote: Dear R-sig phylo list members, I am trying to apply maximum Parsimony tree searches on a large number of simulated datasets within the R environment. So far I have found one function implementing Parsimony: the optim.parsimony function in the package phangorn. This function finds a shortest tree for a typical dataset (1000 taxa) reasonably fast which is great. However, the resulting tree does not include branch lengths (i.e. it is a fully bifurcating cladogram). This is problematic for me because there are nodes separating identical sequences. Is there a way (i.e. function) to calculate branch lengths in terms of number of substitutions for a tree? ..Or does someone know an alternative function to get a maximum Parsinomy tree with branch lengths? Any help or suggestion would be most welcome. Thanks! Robin Robin van Velzen Biosystematics Group Wageningen University [[alternative HTML version deleted]] ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Re: [R-sig-phylo] edge representation of trees produced by birthdeath.tree() in geiger
Hi Andrew. This is because the numbers in tre$edge are node tip numbers, not tip labels (which also can be numbers). The translation between tip numbers and tip labels is given by tre$tip.label. Here, tip labels are indexed by their numbers in tre$edge. For instance, in your case we have: tre$edge [,1] [,2] [1,] 10 11 [2,] 11 12 [3,] 12 13 [4,] 131 [5,] 132 ... (some rows excluded) and: tre$tip.label [1] 8 9 3 1 6 7 2 4 5 This tells us, to take your example, that node 13 should be connected to tips 1 2, which tre$tip.label tells us have labels 8 9, just as we see with: plot(tre); nodelabels() I hope this clears things up somewhat. All the best, Liam -- Liam J. Revell University of Massachusetts Boston web: http://faculty.umb.edu/liam.revell/ email: liam.rev...@umb.edu blog: http://phytools.blogspot.com On 9/9/2011 7:51 AM, Andrew Barr wrote: library(ape) tre-read.tree(text=8:2.850732306,9:0.1789410068):1.147884602,3:0.8848821048):1.411242805,(1:3.869881809,((6:0.6904865854,7:0.02044458561):1.984259398,2:0.04497273798):0.3022512176):0.09298177296):0.3029547699,(4:0.8015431218,5:1.467916249):2.117025776);) ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Re: [R-sig-phylo] post-order tree traversal
Hi Nick. If you use the function reorder.phylo(...,order=pruningwise) the resultant tree edge matrix is suitable for post-order traversal (i.e., the daughters always precede the parents in top-to-bottom matrix traversal). This might be helpful for what you would like to do - for instance if you just label the nodes as they are first encountered in the matrix this would be in the order of a post-order traversal of the tree. All the best, Liam -- Liam J. Revell University of Massachusetts Boston web: http://faculty.umb.edu/liam.revell/ email: liam.rev...@umb.edu blog: http://phytools.blogspot.com On 9/14/2011 9:23 PM, Nick Matzke wrote: Hi all, To display the node reconstructions from another program in APE, I need to label my internal nodes as they would be labeled in a post-order tree traversal. Is there an easy way/function to do this somewhere, or do I have to write my own tree-traversing functions? Cheers, Nick ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Re: [R-sig-phylo] Simulating binomial trait shifts on a phylogeny
Hi Antigoni. I have a function in the new version of phytools called sim.history() that might help with this. This version is not available yet on CRAN, but it can be downloaded from my webpage: http://anolis.oeb.harvard.edu/~liam/R-phylogenetics/. The function generates stochastic histories according to some rate matrix, and stores the histories along the states at all terminal nodes. This code might be helpful in your task. It simulates random trees, for a given number of changes it computes the instantaneous rate matrix that will produce that number of changes, and then it simulates (repeatedly) under that rate until the desired number of changes is obtained. The code fragment uses functions in ape, geiger, and phytools: nchanges-5 # desired number of changes ntrees-100 # desired number of trees ntaxa-100 # desired number of taxa require(phytools); require(geiger) # required packages mtrees-list() for(i in 1:ntrees){ # simulate a stochastic B-D tree using birthdeath.tree tree-drop.tip(birthdeath.tree(b=1,d=0,taxa.stop=(ntaxa+1)),as.character(ntaxa+1)) # compute the rate that results in (on average) nchanges changes q-nchanges/sum(tree$edge.length) # simulate a stochastic history mtree-sim.history(tree,Q=matrix(c(-q,q,q,-q),2,2)) # loop until desired number of changes while((length(unlist(mtree$maps))-nrow(mtree$edge))!=nchanges) mtree-sim.history(tree,Q=matrix(c(-q,q,q,-q),2,2)) mtrees[[i]]-mtree # for fun, plot the history plotSimmap(mtrees[[i]]) } class(mtrees)-multiPhylo # tip states for the kth tree are in mtrees[[k]]$states Please let us know if this is useful or if you have any questions or difficulties implementing. All the best, Liam -- Liam J. Revell University of Massachusetts Boston web: http://faculty.umb.edu/liam.revell/ email: liam.rev...@umb.edu blog: http://phytools.blogspot.com On 10/11/2011 6:18 PM, Antigoni Kaliontzopoulou wrote: Hello everyone, I am studying a binomial trait across a phylogeny and I'm trying to simulate specific numbers of evolutionary shifts (1, 5, 10 shifts) across random trees. Does anyone know how this can be done? Any help is greatly appreciated Cheers, Antigoni ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Re: [R-sig-phylo] Simulating a phylogeny with outgroup on long branch
Hi Scott. I'm not sure why you want to do this, but based on your description it sounds like you could just generate two separate trees and then paste them together across a deep basal split. The following is an example of doing this, although I'm sure you could also come up with your own using different functions than those that I have chosen: require(phytools); require(geiger) # load packages ntaxa1-50; ntaxa2-50 # number of taxa in each clade total.height-10 # total tree height (may need to be adjusted) # simulate subtree 1 tr1-birthdeath.tree(b=1,d=0,taxa.stop=ntaxa1) tr1$tip.label-1:ntaxa1; tr1$root.edge-0 # simulate subtree 2 tr2-birthdeath.tree(b=1,d=0,taxa.stop=ntaxa2) tr2$tip.label-1:ntaxa2+ntaxa1; tr2$root.edge-0 # compute tree heights, so that the whole thing can be ultrametric height1-max(nodeHeights(tr1)) height2-max(nodeHeights(tr2)) # create the basal split tr3-list(Nnode=1,edge=matrix(c(3,1,3,2),2,2,byrow=T),edge.length=c(total.height-height1,total.height-height2),tip.label=c(tr1,tr2)) class(tr3)-phylo # attach the two subtrees one by one tr3$tip.label[tr3$tip.label==tr1]-NA tr3-paste.tree(tr3,tr1) tr3$tip.label[tr3$tip.label==tr2]-NA tr3-paste.tree(tr3,tr2) # plot the full tree plot(tr3) I hope this is kind of what you are going for. All the best, Liam -- Liam J. Revell University of Massachusetts Boston web: http://faculty.umb.edu/liam.revell/ email: liam.rev...@umb.edu blog: http://phytools.blogspot.com On 10/13/2011 5:32 PM, Scott Chamberlain wrote: Hello, I want to simulate two types of phylogenies: one normally done with, for example, rcoal in ape; and the second type with an outgroup on a long branch. The first is easy with rcoal. I'm not sure how to make the second kind of tree. An example of this second kind is a tree with say all animals, and then an outgroup with all plant species, so that you have many animal species that have relatively short branches among them, and then the distance between plants and animals is relatively long. Is there a way to simulate this second kind of tree with any available R functions? Thanks! Scott Chamberlain Rice University [[alternative HTML version deleted]] ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Re: [R-sig-phylo] basal dichotomy in phylo object
In my phytools package (http://cran.r-project.org/web/packages/phytools/index.html) there is a function reroot() that is really just a simple wrapper around root() that allows you to root along internal or terminal edges (rather than only at nodes). In your example, assuming that the node 500 is tipward of 174 in the tree, to re-root the tree halfway along the edge leading to node number 500 (in tree$edge), you would just do: rooted.tree-reroot(tree,node.number=500,position=tree$edge.length[tree$edge[,2]==500]/2) I think. All the best, Liam -- Liam J. Revell University of Massachusetts Boston web: http://faculty.umb.edu/liam.revell/ email: liam.rev...@umb.edu blog: http://phytools.blogspot.com On 10/25/2011 10:44 AM, Ondřej Mikula wrote: Hello, I have an unrooted tree, but I know that basal dichotomy is between nodes labeled 174 and 500. I am wandering how to indicate it in the object of class 'phylo'. I tried 'root' function of 'ape' but with no success. I will be grateful for any advice. Best wishes Ondrej Mikula ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Re: [R-sig-phylo] Comparative Methods and Pseudo-Traits
Hi David. In general it is inconsequential whether X or Y are biologically inherited traits; but whether the residual error in Y given X is correlated or independent among species. In the case of growth rate as a function of habitat degradation this corresponds to: Growth Rate = beta0 + beta1*Habitat Degradation + e It seems highly plausible that the residual error in Growth Rate (e), that is error not explained by Habitat Degradation, is phylogenetically correlated. This would imply only that closely related species have growth rates that deviate from the mean relationship between growth rate and habitat degradation in similar ways. In this case, PIC or some other phylogenetic method should be used. To be clear also, contrasts need to be computed for both X Y regardless of whether or not there is phylogenetic autocorrelation in both X Y! (See Revell 2010, MEE; or, better yet, Stone 2011, Syst. Biol. doi:10.1093/sysbio/syq098.) Regarding extinction probability, perhaps someone else on the list can address that specific example. All the best, Liam -- Liam J. Revell University of Massachusetts Boston web: http://faculty.umb.edu/liam.revell/ email: liam.rev...@umb.edu blog: http://phytools.blogspot.com On 11/10/2011 3:36 PM, David Bapst wrote: Hello all, A recent discussion set my mind thinking on a particular issue and, once again, I decided to ask for the general opinion of R-Sig-Phylo denizens. It may be easier to start with an example. Let's say that there exists a worker who is measuring several different traits across a number of species and then testing for correlations among these traits. The first test is body size versus growth rate and they use independent contrasts or PGLS to test for a the correlation, accounting for phylogeny. Both of these traits are inherited, evolving variables. Now let's say they'd like to test for the relationship between growth rate and some metric of the anthropogenic degradation of that species' habitat. Now what? It is even valid to apply PIC to the habitat degradation metric even though it is not an inherited, evolving trait? It's unclear to me. Let's consider a paleontological example, one which I have found myself both strongly agreeing and disagreeing with at times. Essentially, how should we test for extinction selectivity on some trait at a mass extinction event? Let's say we think body size is a predictor of the risk of extinction during that event and so we want to test for a correlation between them (please ignore that extinction would be a discrete variable for the moment). Do we treat these variable with PIC or PGLS? Is it really proper to refer to the probability of going extinct during a mass extinction as an evolving trait? Let's say we did and we got different results than when we used an analysis which did not account for the phylogenetic covariance. How should we interpret these results? One explanation I know of is that when we apply phylogenetic comparative methods to these quasi-traits to consider their relationship to another trait, we are assuming that these variables are actually the result of some underlying, unobserved set of traits which are evolving along the phylogeny. This makes sense, maybe in the extinction event case, which would mean that any PCM analysis would be testing for an evolutionary relationship between body size and these unobserved traits which predict extinction. Of course, if extinction risk is largely a function of non-inherited traits, then the initial assumption may be incorrect (that extinction risk itself is an evolving trait). Regardless, I don't see how to apply that explanation to the habitat degradation example. So, what do people think? How should we test for correlation when non-evolving quasi-traits are involved? I'm very interested to hear people's thoughts on this matter. -Dave Bapst, UChicago ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Re: [R-sig-phylo] Comparative Methods and Pseudo-Traits
Hi David. Poaching Intensity = beta0 + beta1*Body Size + e I think it depends on how the residual error in the model is distributed (esp. correlated) among species. It seems possible to invent hypothetical scenarios (as I did in my previous email) about how the residual error in poaching intensity given body size could be phylogenetically autocorrelated, but this is fundamentally an empirical question. If the residual error of poaching intensity given body size is phylogenetically correlated and we ignore this then we risk overestimating the predictive value of our model. In addition, the residual error is likely/guaranteed to be non-Brownian if the response variable is binary (e.g., extant v. extinct). For these type of data the tree should not be ignored, but simple GLS regression is probably not appropriate. One option might be the phylogenetic logistic regression of Ives Garland (2009), but I'm not too familiar with this method. All the best, Liam -- Liam J. Revell University of Massachusetts Boston web: http://faculty.umb.edu/liam.revell/ email: liam.rev...@umb.edu blog: http://phytools.blogspot.com On 11/14/2011 3:54 PM, David Bapst wrote: Liam, Joe, Pasquale, all- Thank you for your kind input.It seems that I am not the only one who considers this issue at length. There is just one point I'd like clarification of. Liam, in my first example which you used, the inherited trait is the response and the not-directly-inheritable trait the predictor, such that: Growth Rate = beta0 + beta1*Habitat Degradation + e What if we were to switch these? It seems to me this is the only real difference in the extinction example. To make a neontological example, let's say we were interested in whether larger mammals were more likely to experience higher levels of poaching. The response here would be the poaching intensity and body size the predictor, such that: Poaching Intensity = beta0 + beta1*Body Size + e Would your argument still apply? (It seems to me that it should.) If so, then it would seem your explanation should equally apply to extinction selectivity cases. -Dave On 11/10/2011 3:36 PM, David Bapst wrote: Hello all, A recent discussion set my mind thinking on a particular issue and, once again, I decided to ask for the general opinion of R-Sig-Phylo denizens. It may be easier to start with an example. Let's say that there exists a worker who is measuring several different traits across a number of species and then testing for correlations among these traits. The first test is body size versus growth rate and they use independent contrasts or PGLS to test for a the correlation, accounting for phylogeny. Both of these traits are inherited, evolving variables. Now let's say they'd like to test for the relationship between growth rate and some metric of the anthropogenic degradation of that species' habitat. Now what? It is even valid to apply PIC to the habitat degradation metric even though it is not an inherited, evolving trait? It's unclear to me. Let's consider a paleontological example, one which I have found myself both strongly agreeing and disagreeing with at times. Essentially, how should we test for extinction selectivity on some trait at a mass extinction event? Let's say we think body size is a predictor of the risk of extinction during that event and so we want to test for a correlation between them (please ignore that extinction would be a discrete variable for the moment). Do we treat these variable with PIC or PGLS? Is it really proper to refer to the probability of going extinct during a mass extinction as an evolving trait? Let's say we did and we got different results than when we used an analysis which did not account for the phylogenetic covariance. How should we interpret these results? One explanation I know of is that when we apply phylogenetic comparative methods to these quasi-traits to consider their relationship to another trait, we are assuming that these variables are actually the result of some underlying, unobserved set of traits which are evolving along the phylogeny. This makes sense, maybe in the extinction event case, which would mean that any PCM analysis would be testing for an evolutionary relationship between body size and these unobserved traits which predict extinction. Of course, if extinction risk is largely a function of non-inherited traits, then the initial assumption may be incorrect (that extinction risk itself is an evolving trait). Regardless, I don't
Re: [R-sig-phylo] covriance matrix internal nodes
Here is an alternative version that is more than twice as fast (helpful for large trees), and only a little bit slower than vcv.phylo(): vcvPhylo2-function(tree,anc.nodes=T){ n-length(tree$tip.label) h-nodeHeights(tree)[order(tree$edge[,2]),2] h-c(h[1:n],0,h[(n+1):length(h)]) M-mrca(tree,full=anc.nodes)[c(1:n,anc.nodes*(n+2:tree$Nnode)),c(1:n,anc.nodes*(n+2:tree$Nnode))] C-matrix(h[M],nrow(M),ncol(M)) if(anc.nodes) rownames(C)-colnames(C)-c(tree$tip.label,n+2:tree$Nnode) else rownames(C)-colnames(C)-tree$tip.label return(C) } All the best. - Liam -- Liam J. Revell University of Massachusetts Boston web: http://faculty.umb.edu/liam.revell/ email: liam.rev...@umb.edu blog: http://phytools.blogspot.com On 11/21/2011 6:16 PM, ppi...@uniroma3.it wrote: Hi Liam, it works perfectly! Thankyou very much! Andit is elegant... Hi Paolo. I think this code fragment from my phytools function anc.trend does what you want (here, I have made it into its own function), although it might not be the most elegant way to do it: vcvPhylo-function(phy,anc.nodes=T){ if(anc.nodes){ D-dist.nodes(phy) ntips-length(phy$tip.label) Cii-D[ntips+1,] C-D; C[,]-0 for(i in 1:nrow(D)) for(j in 1:ncol(D)) C[i,j]-(Cii[i]+Cii[j]-D[i,j])/2 dimnames(C)[[1]][1:length(phy$tip)]-dimnames(C)[[2]][1:length(phy$tip)]-phy$tip.label C-C[c(1:ntips,(ntips+2):nrow(C)),c(1:ntips,(ntips+2):ncol(C))] return(C) } else return(vcv.phylo(phy)) } All the best, Liam -- Liam J. Revell University of Massachusetts Boston web: http://faculty.umb.edu/liam.revell/ email: liam.rev...@umb.edu blog: http://phytools.blogspot.com On 11/21/2011 12:09 PM, ppi...@uniroma3.it wrote: Hi all, Having a tree (NOT ultrametric in my case) I would like to extract the phylogenetic covariance matrix including both tips and nodes (like dist.nodes for patristic distances). vcv returns the cov matric just for tips. Someone has a fast way? Thankyou very much Paolo ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Re: [R-sig-phylo] split data set and topology on PGLS
Hi Renata. I agree with Ted - you should be able to fit an ANCOVA model with gls(y~x+ecomorph,...,correlation=corBrownian(phy=tree)), or something like this, where ecomorph is a factor. Alternatively, you could use the method we devised in Revell Collar (2009; Evolution) in which we first reconstructed the discrete character on the tree (in your case, this would be ecomorph), and then we fit a model in which different covariance structure between our variables was permitted on different branches of the tree. This method is also implemented in my phytools package (function: evol.vcv). All the best, Liam -- Liam J. Revell University of Massachusetts Boston web: http://faculty.umb.edu/liam.revell/ email: liam.rev...@umb.edu blog: http://phytools.blogspot.com On 12/14/2011 3:26 PM, Theodore Garland Jr wrote: Why not just include an interaction term between your main effect coding variable (presumably coded as 0, 1 to indicate your two ecomorphs) and one or more of your continuous independent variables? Cheers, Ted Theodore Garland, Jr. Professor Department of Biology University of California, Riverside Riverside, CA 92521 Office Phone: (951) 827-3524 Home Phone: (951) 328-0820 Facsimile: (951) 827-4286 = Dept. office (not confidential) Email: tgarl...@ucr.edu http://www.biology.ucr.edu/people/faculty/Garland.html Experimental Evolution: Concepts, Methods, and Applications of Selection Experiments Edited by Theodore Garland, Jr. and Michael R. Rose http://www.ucpress.edu/book.php?isbn=9780520261808 (PDFs of chapters are available from me or from the individual authors) From: r-sig-phylo-boun...@r-project.org [r-sig-phylo-boun...@r-project.org] on behalf of Matt Pennell [mwpenn...@gmail.com] Sent: Wednesday, December 14, 2011 12:17 PM To: Renata Brandt Cc: r-sig-phylo@r-project.org Subject: Re: [R-sig-phylo] split data set and topology on PGLS Hi Renata, This seems to be an appropriate approach. (I am sure someone will correct me if i am wrong in this). Phylogenetic regression in the way that you are using it is basically to remove the statistical non-independence between data points. Effectively what you are doing by splitting your data into two morphologies is pruning your tree. Under Brownian motion, evolution along any branch is independent of evolution along any other branch, so pruning your tree will not change the correlation struction of the data. (Similarily you can still do PGLS with incomplete sampling of species). So as long as you can justify splitting the data up as you are, it seems fine. cheers, matt On Wed, Dec 14, 2011 at 12:09 PM, Renata Brandtrenata.bra...@gmail.comwrote: Hello everybody. I have some questions for you. When performing a phylogenetic regression between two continuous traits, the dependent trait clearly divides my data set in two distinct morphologies. This division overlaps with what I previously thought were two different ecomorphs. The relationship between the same traits seems to be different for each morphology. Now my question is: - Is it ok to split the data set (one data table and topology for each ecomorph) and reanalyze data to get the relationship between traits for each group? (That is what I would do if not using a phylogenetic approach). - If this is wrong, any hints on how should I proceed? Many thanks. Cheers. -- Renata Brandt Departamento de Biologia - FFCLRP Universidade de São Paulo Brasil (16) 82039533 /..) ( ( ) ) ( ( ) ) _( (_ ) ) \ ) \ [[alternative HTML version deleted]] ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo [[alternative HTML version deleted]] ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Re: [R-sig-phylo] Measures of dissimilarity between phylogenetic trees
Hi Bruno. RF.dist in the phangorn package computes Robinson-Foulds distance (http://en.wikipedia.org/wiki/Robinson-Foulds_metric). This is not obvious, but (unless you want the position of the root to affect the score) you should probably unroot your trees before using RF.dist, e.g.: tr1-read.tree(text=(A,(B,C));) plot(tr1) tr2-read.tree(text=((A,B),C);) # tr1 tr2 have the same unrooted topology, yet: RF.dist(tr1,tr2) [1] 2 RF.dist(unroot(tr1),unroot(tr2)) [1] 0 In addition, phangorn:treedist and ape:dist.topo provide other metrics of tree distance. All the best, Liam -- Liam J. Revell University of Massachusetts Boston web: http://faculty.umb.edu/liam.revell/ email: liam.rev...@umb.edu blog: http://phytools.blogspot.com On 12/26/2011 6:27 PM, Bruno de Medeiros wrote: Hi everyone, I am simulating characters evolving in a phylogetic tree under a number of models and using those characters to build another tree using parsimony. I would like, then, to measure how similar are the recovered trees when compared to the original ones, taking only topology into account. Is there any function in the R packages to calculate tree distance metrics? I considered using TNT (the program that I am using to build phylogenies) to calculate SPR distances, but it would be much more convenient if there were a way of doing it in R. Thanks in advance, Bruno -- Bruno A. S. de Medeiros PhD Student - Farrell Lab Harvard University [[alternative HTML version deleted]] ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Re: [R-sig-phylo] phyDat format problem
Hi Jen. Try changing to a matrix first (from a data frame). [I.e., p-as.matrix(p); p2-phyDat(p,type=USER,levels=c(0,1)) ] I'm not sure why this works, but it seems to. - Liam More details below. When I do: p-matrix(c(1,0,0,0,1,0,1,0,1,0, 1,1,0,1,1,0,0,1,0,0, 1,0,0,1,0,1,0,0,1,0, 0,1,0,1,1,1,0,0,0,0, 0,0,0,1,1,1,1,0,1,0),5,10,byrow=T) dimnames(p)-list(c(A,B,C,D,E),paste(X,1:10,sep=)) p2-phyDat(p,type=USER,levels=c(0,1)) I get: p2 5 sequences with 10 character and 9 different site patterns. The states are 0 1 However: p-as.data.frame(p) p2-phyDat(p,type=USER,levels=c(0,1)) Gives me: p2 10 sequences with 5 character and 5 different site patterns. The states are 0 1 -- Liam J. Revell University of Massachusetts Boston web: http://faculty.umb.edu/liam.revell/ email: liam.rev...@umb.edu blog: http://phytools.blogspot.com On 1/9/2012 9:14 AM, J Greenwood wrote: Hi all, I am having a problem with getting parsimony scores in Phangorn and have found that the program is not reading in my datafile the way I expect it. I have 10 characters and 5 taxa, but phangorn seems to read this the opposite way round. Can anybody help? Details follow, thanks, Jen I made a test dataset to try and work out the problem which is the following: p-read.table(file=tab_phangorn.txt, header=T) p X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 A 1 0 0 0 1 0 1 0 1 0 B 1 1 0 1 1 0 0 1 0 0 C 1 0 0 1 0 1 0 0 1 0 D 0 1 0 1 1 1 0 0 0 0 E 0 0 0 1 1 1 1 0 1 0 Where A-E are taxa and X1-X10 are characters. I then tried converting it to phyDat format which gives the following: p2-phyDat(p, type=USER, levels=c(0,1)) p2 10 sequences with 5 character and 5 different site patterns. The states are 0 1 str(p2) List of 10 $ X1 : int [1:5] 2 2 2 1 1 $ X2 : int [1:5] 1 2 1 2 1 $ X3 : int [1:5] 1 1 1 1 1 $ X4 : int [1:5] 1 2 2 2 2 $ X5 : int [1:5] 2 2 1 2 2 $ X6 : int [1:5] 1 1 2 2 2 $ X7 : int [1:5] 2 1 1 1 2 $ X8 : int [1:5] 1 2 1 1 1 $ X9 : int [1:5] 2 1 2 1 2 $ X10: int [1:5] 1 1 1 1 1 - attr(*, class)= chr phyDat - attr(*, weight)= int [1:5] 1 1 1 1 1 - attr(*, nr)= int 5 - attr(*, nc)= int 2 - attr(*, index)= int [1:5] 1 2 3 4 5 - attr(*, levels)= num [1:2] 0 1 - attr(*, allLevels)= chr [1:3] 0 1 ? - attr(*, type)= chr USER - attr(*, contrast)= num [1:3, 1:2] 1 0 1 0 1 1 It has read the data in the opposite way round to the way i was expecting, which was odd, but I transposed the matrix anyway, but got the same output: ptrans-t(p) ptrans A B C D E X1 1 1 1 0 0 X2 0 1 0 1 0 X3 0 0 0 0 0 X4 0 1 1 1 1 X5 1 1 0 1 1 X6 0 0 1 1 1 X7 1 0 0 0 1 X8 0 1 0 0 0 X9 1 0 1 0 1 X10 0 0 0 0 0 ptrans2 10 sequences with 5 character and 5 different site patterns. The states are 0 1 str(ptrans2) List of 10 $ X1 : int [1:5] 2 2 2 1 1 $ X2 : int [1:5] 1 2 1 2 1 $ X3 : int [1:5] 1 1 1 1 1 $ X4 : int [1:5] 1 2 2 2 2 $ X5 : int [1:5] 2 2 1 2 2 $ X6 : int [1:5] 1 1 2 2 2 $ X7 : int [1:5] 2 1 1 1 2 $ X8 : int [1:5] 1 2 1 1 1 $ X9 : int [1:5] 2 1 2 1 2 $ X10: int [1:5] 1 1 1 1 1 - attr(*, class)= chr phyDat - attr(*, weight)= int [1:5] 1 1 1 1 1 - attr(*, nr)= int 5 - attr(*, nc)= int 2 - attr(*, index)= int [1:5] 1 2 3 4 5 - attr(*, levels)= num [1:2] 0 1 - attr(*, allLevels)= chr [1:3] 0 1 ? - attr(*, type)= chr USER - attr(*, contrast)= num [1:3, 1:2] 1 0 1 0 1 1 Can anyone explain to me what I am doing wrong? How can I get the phyDat format to hold my information correctly? -- J Greenwood jenny.greenw...@bristol.ac.uk ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Re: [R-sig-phylo] Lambda; PIC vs. PGLS
Hi Karin. gls(...,correlation=corPagel(...,fixed=F),method=ML) and pgls(...,lambda=ML) should produce the same result unless one or the other is not optimizing properly or unless different bounds on the fitted model parameters (in particular, lambda) are used. To show this to yourself, you can try the following example which simulates and then fits the same model using three different functions. I hope this is helpful. - Liam set.seed(1) require(phytools); require(caper); require(nlme); require(geiger) # simulate tree tree-pbtree(n=100,scale=1) # simulate x under BM x-fastBM(tree) # simulate y residual error under the lambda model y-0.4*x+fastBM(lambdaTree(tree,0.7),sig2=0.5) data-as.data.frame(cbind(x,y)) data-cbind(names(x),data) colnames(data)-c(Species,x,y) # fit using nlme:gls r1-gls(y~x,data=data,correlation=corPagel(0.5,tree,fixed=FALSE),method=ML) # fit using caper:pgls r2-pgls(y~x,data=comparative.data(tree,data,Species),lambda=ML) # fit using phytools:phyl.resid r3-phyl.resid(tree,x,y,method=lambda) -- Liam J. Revell University of Massachusetts Boston web: http://faculty.umb.edu/liam.revell/ email: liam.rev...@umb.edu blog: http://phytools.blogspot.com On 1/13/2012 6:04 AM, Karin Schneeberger wrote: Dear all I am currently analysing a set of traits from different species - let's say brain size and body mass - using a tree where branch lengths were set artificially (after Graphen) as the length was not available. I calculate phylogenetic independent contrasts (PIC) using the package caper and the crunch-algorithm. Beforehand, I used the function caic.diagnostic to test whether the model is appropriate (which was the case). In the end, this provided me with a significant correlation between brain size and body mass. I then calculate lambda as a measurement for phylogenetic signal, using the package ape: data- comparative.data(tree, data, Species) result- gls(brainsize~bodymass, data = data, correlation = corPagel(value = 1, phy = tree, fixed = FALSE), method = ML) summary(result) Here, lambda was 1, which - as far as I understood - means evolution under Brownian motion. I perform a phylogenetic generalised least square (PGLS) regression using caper, which includes lambda, as an addition to the PIC analysis: data- comparative.data(tree, data, Species, vcv = TRUE) Mod1- pgls(brainsize~bodymass, data = data, lambda = ML) summary(Mod1) Surprisingly, lambda here was zero. Furthermore, the correlation between brain size and body mass became non-significant. I now have two contradiction values for lambda and results from both PIC and PGLS that differ dramatically in the way they can be interpreted. How does it come that with one method, lambda is zero and with the other lambda is one? And in the end, which method should be used, PIC or PGLS, or should both be discussed? I would be grateful for advice.Thank you very much in advance for your help. Best regards Karin Schneeberger PhD candidate in Ecology and Evolution [[alternative HTML version deleted]] ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Re: [R-sig-phylo] Blomberg K Statistic
Hi Daniel. I can think of two possibilities: 1) This is an error. 2) Your tree has an extremely unusual shape (for instance, it has many bifurcations very close to the present day). On constant rate Yule trees, substituting 1s for the true branch lengths does not (on average) bias K in any particular direction. If you share your tree and data I would be happy to look at the issue more closely. All the best, Liam -- Liam J. Revell University of Massachusetts Boston web: http://faculty.umb.edu/liam.revell/ email: liam.rev...@umb.edu blog: http://phytools.blogspot.com On 1/19/2012 9:53 PM, dwe...@life.illinois.edu wrote: Hi, I'm using the Picante package in R to calculate the Blomberg K. I'm trying to test for a phylogenetic signal in my data. I'm using a dataset with the PC score for 21 different species on a tree with varying branch lengths. I did this using the code here: http://bodegaphylo.wikispot.org/IV._Testing_Phylogenetic_Signal_in_R I got a K statistic of 0.0006457603 and yet the resultant PIC.variance.P is 0.001. If I'm correct, the PIC.variance.P is the p-value and significant at 0.05. Is that correct? I then decided to play with my data and set all branch lengths equal to 1 to see how that would affect things. I got a K statistic of 2.921763 and a PIC.variance.P value of 0.001. My understanding is that K statistics close to 0 are not significant and become significant close to and above 1. I'm confused why I got a significant p-value for both K statistics, even though one is very close to 0 (0.00065) and the other is above 2. I would appreciate any help in explaining Picante and the K statistic. Thank you. -Daniel º -º º º º Daniel P Welsh PhD Candidate Teaching Assistant Department of Animal Biology University of Illinois at Urbana-Champaign 202 Shelford Vivarium 606 E. Healey Street Champaign, IL 61821 lab phone: (217) 333-5323 ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Re: [R-sig-phylo] Bug in geiger::fitContinuous with meserr
Hi all. If you don't want to apply the fix, you can circumvent the problem by first assuring that meserr is in the same order as the tree tip labels, and then removing the names from the input vector of standard errors. So, for instance: se-se[tree$tip.label] names(se)-NULL fitContinuous(tree,x,model=lambda,meserr=se) works. Note that the order of meserr should be, so far as I can tell, the order of tree$tip.label and not the order of the data in x. This is because the internally called function treedata(...,sort=T) sorts the data into the order of the tip labels. This also agrees with Matt's diagnosis of the problem because if is.null(names(meserr)) then the code with the bug is circumvented. In theory, one should be able to supply meserr as a matrix; unfortunately there is a second bug whereby: o-match(rownames(td$data),rownames(meserr)) meserr-meserr[o,] should be changed to o-match(rownames(td$data),rownames(meserr)) meserr-as.matrix(meserr[o,]) because the former inadvertently converts meserr to a vector, when a matrix is subsequently needed. - Liam -- Liam J. Revell University of Massachusetts Boston web: http://faculty.umb.edu/liam.revell/ email: liam.rev...@umb.edu blog: http://phytools.blogspot.com On 2/2/2012 1:41 PM, Matt Johnson wrote: I'm not sure if this is the best place to make a bug report, but I thought others might be interested. I was trying to replicate, with geiger, a demonstration of estimating lambda with measurement error from Liam's blog: http://phytools.blogspot.com/2011/11/including-measurement-error-in.html Replicating this with fitContinuous, I received the following error: fitContinuous(tree,xe,model=lambda,meserr=se) Error in meserr[o, ] : incorrect number of dimensions The problem is related to this bit of code in fitContinuous, used to determine whether meserr is a single value, a vector, or a matrix: else if (is.vector(meserr)) { if (!is.null(names(meserr))) { o- match(rownames(td$data), names(meserr)) if (length(o) != ntax) stop(meserr is missing some taxa from the tree) meserr- as.matrix(meserr[o,]) Removing the comma from the final line fixed the problem. The values I receive from geiger after removing the comma are identical to the values I get from phytools::phylosig for the same data, so I am assuming I haven't broken anything important. The full code I used is at the end of this message, for those interested. Best, Matt Johnson library(geiger) Loading required package: ape Loading required package: MASS Loading required package: mvtnorm Loading required package: msm Loading required package: ouch Loading required package: subplex library(phytools) Loading required package: mnormt Loading required package: phangorn Loading required package: quadprog Loading required package: igraph Loading required package: rgl source(mjContinuous.R) #fitContinuous with offending comma removed set.seed(2) gen.lambda=0.7 tree = drop.tip(birthdeath.tree(b=1,d=0,taxa.stop=501),501) se = sqrt(rchisq(n=500,df=2)) names(se) = tree$tip.label x= fastBM(lambdaTree(tree,gen.lambda),sig2=1) e = rnorm(n=500,sd=se) xe=x+e fitContinuous(tree,x,model=lambda) Fitting lambda model: $Trait1 $Trait1$lnl [1] -1031.783 $Trait1$beta [1] 1.221674 $Trait1$lambda [1] 0.7613218 $Trait1$aic [1] 2069.567 $Trait1$aicc [1] 2069.615 $Trait1$k [1] 3 fitContinuous(tree,xe,model=lambda) Fitting lambda model: $Trait1 $Trait1$lnl [1] -1154.987 $Trait1$beta [1] 1.412728 $Trait1$lambda [1] 0.5495113 $Trait1$aic [1] 2315.975 $Trait1$aicc [1] 2316.023 $Trait1$k [1] 3 fitContinuous(tree,xe,model=lambda,meserr=se) Error in meserr[o, ] : incorrect number of dimensions mjContinuous(tree,xe,model=lambda,meserr=se) #with problematic comma removed Fitting lambda model: $Trait1 $Trait1$lnl [1] -1141.14 $Trait1$beta [1] 1.170196 $Trait1$lambda [1] 0.7484273 $Trait1$aic [1] 2288.281 $Trait1$aicc [1] 2288.329 $Trait1$k [1] 3 ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Re: [R-sig-phylo] Reconstructing Ancestral Characters and Exporting Data
Hi Amy. Yes, if you wanted to export ancestral character values as node labels, you can. For instance: temp-ace(x,tree) tree$node.label-round(temp$ace,5) # for instance write.tree(tree,file=) You could also loop this over the set of trees in a multiPhylo object. I hope this helps. - Liam -- Liam J. Revell University of Massachusetts Boston web: http://faculty.umb.edu/liam.revell/ email: liam.rev...@umb.edu blog: http://phytools.blogspot.com On 2/8/2012 1:59 PM, Amy Boddy wrote: Hello, I was wondering if somebody could help me with a problem I am having in APE. I want to reconstruct the ancestral states of some continuous characters for over 600 mammalian species. I was able to reconstruct the ancestral states with the ace function:ace_body = ace(data$LogBody, tree, type=continuous and then plot them on my tree: nodelabels(round(ace_body$ace, digits=1)) However, my problem is that my data tree is very large, so plotting is a bit messy. My tree file does not have the internal nodes labeled, and I know that they are assigned a number to them (ex. if i have 600 species, the nodes are numbered starting with 601, 602 etc). So my question is, what is the best way to export this information? Is there a function to write this tree with the node labels (i.e. the ancestral reconstructions?)? Any input would be helpful and very much appreciated. Thank you! -Amy [[alternative HTML version deleted]] ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Re: [R-sig-phylo] differing sigma2 results
Hi Pascal. The first clue is the the length of your tree is exactly the same as the ratio of your two sets of rates. This suggests that OUCH is first rescaling the tree to unit length. Let's see if this explains your results, just working with your first character: tree-read.nexus(testtree.nex) X-read.csv(spdata.csv,header=T,row.names=1) X-as.matrix(X[,1:4]) fitContinuous(tree,X[,1]) Fitting BM model: $Trait1 $Trait1$lnl [1] -164.5281 $Trait1$beta [1] 0.1788076 $Trait1$aic [1] 333.0561 $Trait1$aicc [1] 333.208 $Trait1$k [1] 2 fitContinuous(rescaleTree(tree,1),X[,1]) Fitting BM model: $Trait1 $Trait1$lnl [1] -164.5281 $Trait1$beta [1] 5.885607 $Trait1$aic [1] 333.0561 $Trait1$aicc [1] 333.208 $Trait1$k [1] 2 Looks like that's your explanation. Note that the first rate from fitContinuous on the tree in its original scale is slightly different from what you report: off by a factor of n/(n-1), in fact. This is because ic.sigma returns the unbiased REML estimates of the rate (from Felsenstein's contrasts); whereas the ML rates from fitContinuous and OUCH are biased by a factor of (n-1)/n. Note also that the likelihoods (and thus the AIC scores) are not affected by rescaling the tree. This is because scaling the branches by k the rate sigma^2 by 1/k yields exactly the same log-likelihood. This is different from the effect of rescaling the data which affects the likelihood, but should not alter the relative likelihoods of different models fit to the same rescaled data! All the best, Liam -- Liam J. Revell University of Massachusetts Boston web: http://faculty.umb.edu/liam.revell/ email: liam.rev...@umb.edu blog: http://phytools.blogspot.com On 2/11/2012 10:20 AM, Pascal Title wrote: Hello all I am trying to calculate sigma2 values for a clade, using 4 PCA axes as my characters. I first wanted to figure out if my data better fit a BM model or an OU1 model of evolution. So I compared AICc values using the fitContinuous function in Geiger. I found the following: BM OU Score1 2.6 0 Score2 9.1 0 Score3 17.2 0 Score4 9.8 0 However, using the pmc package, I ran pmc(tree,data,modelA='BM',modelB='OU',nboot=100) for the first PC axis and found that the BM and OU1 distributions are mostly overlapping, which I believe tells me that you can't really differentiate BM and OU with the data at hand (see plot of pmchttp://dl.dropbox.com/u/34644229/Tangarinae.pdf ). Therefore, I was planning on calculating sigma2 based on a BM model of evolution. But I noticed that I get hugely different results using different methods: (here is my treehttp://dl.dropbox.com/u/34644229/testtree.nex, here is my datahttp://dl.dropbox.com/u/34644229/spdata.csv) With OUCH method: require(geiger) require(ouch) tree-read.nexus('testtree.nex') spdata-read.csv('spdata.csv',stringsAsFactors=FALSE) rownames(spdata)-spdata$X spdata$X-NULL spdata$labels-NULL spdata-spdata1 ot-ape2ouch(tree) otd-as(ot,data.frame) spdata$labels-rownames(spdata) otd-merge(otd,spdata,by=labels,all=TRUE) rownames(otd)-otd$nodes ot-with(otd,ouchtree(nodes=nodes,ancestors=ancestors,times=times,labels=labels)) b1-brown(tree=ot,data=otd[colnames(spdata1)]) sigma2-coef(b1)$sigma.sq.matrix sigma2 [,1] [,2][,3] [,4] [1,] 5.885605498 1.17240962 0.7760799 -0.004528573 [2,] 1.172409617 1.52861569 0.1039507 -0.073455526 [3,] 0.776079917 0.10395074 1.0567364 -0.158017246 [4,] -0.004528573 -0.07345553 -0.1580172 1.236342488 With ic.sigma method from GEIGER: require(geiger) tree-read.nexus('testtree.nex') spdata-read.csv('spdata.csv',stringsAsFactors=FALSE) rownames(spdata)-spdata$X spdata$X-NULL spdata$labels-NULL sig-ic.sigma(tree,spdata) Score1 Score2 Score3 Score4 Score1 0.1809881293 0.036052743 0.023865217 -0.0001392581 Score2 0.0360527432 0.047006429 0.003196587 -0.0022588293 Score3 0.0238652170 0.003196587 0.032495680 -0.0048591850 Score4 -0.0001392581 -0.002258829 -0.004859185 0.0380187415 Interestingly, the OUCH result is greater than the GEIGER result by a factor of 32. I also simulated a tree and data under BM and tried both methods and they agree, so somehow this has something to do with my data. Both methods are estimating these parameters under brownian motion, so where is there a breakdown? Any advice would be greatly appreciated! ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Re: [R-sig-phylo] problem calculating independent contrasts
Graham is absolutely right. If you did this you would find that Chlorocebus pygerythrus has an underscore separating genus specific epithet in your tree, but not in your data vector: require(geiger) name.check(tree,x) $Tree.not.data [1] Chlorocebus_pygerythrus $Data.not.tree [1] Chlorocebus pygerythrus - Liam -- Liam J. Revell University of Massachusetts Boston web: http://faculty.umb.edu/liam.revell/ email: liam.rev...@umb.edu blog: http://phytools.blogspot.com On 2/20/2012 7:30 PM, Graham Slater wrote: Hi Tom, have you tried running name.check(t.100species, log_repertoire_data) to confirm that the names perfectly match in both? If there is even a slight mismatch, then pic will ignore all the names in the data and assume that they are in the same order as the tip labels in the tree. Thus the pics returned would be random. Graham Graham Slater Department of Ecology and Evolutionary Biology University of California, Los Angeles 621 Charles E Young Drive South Los Angeles CA 90095-1606 (310) 825-4669 gsla...@ucla.edu www.eeb.ucla.edu/gslater On Feb 20, 2012, at 3:54 PM, Tom Schoenemann wrote: Hello, I keep getting the following error when trying to calculate independent contrasts: pic.log_repertoire_data- pic(log_repertoire_data, t.100species) Warning message: In pic(log_repertoire_data, t.100species) : the names of argument 'x' and the tip labels of the tree did not match: the former were ignored in the analysis. However, unless I'm misunderstanding what this means, then it is not correct. My data is in: log_repertoire_data Alouatta_palliata Alouatta_seniculus Aotus_nigricepsAotus_trivirgatus 0.778151 0.778151 0.778151 0.778151 Ateles_belzebuth Ateles_fusciceps Ateles_geoffroyi Brachyteles_arachnoides 0.778151 0.954243 1.322219 0.903090 Cacajao_calvusCallicebus_moloch Callicebus_torquatusCallimico_goeldii 1.079181 1.041393 0.845098 0.845098 Callithrix_jacchus Callithrix_penicillata Callithrix_pygmaea Cebus_capucinus 0.954243 0.602060 1.176091 1.414973 Cebus_olivaceus Lagothrix_lagotricha Leontopithecus_rosaliaPithecia_pithecia 1.079181 1.146128 1.00 1.00 Saguinus_fuscicollis Saguinus_geoffroyi Saguinus_midas Saguinus_oedipus 1.113943 1.00 0.903090 0.954243 Saimiri_oerstedii Saimiri_sciureus Cercocebus_torquatus_atys Cercopithecus_ascanius 0.602060 1.301030 1.079181 0.845098 Cercopithecus_campbelli Cercopithecus_cephus Cercopithecus_mitis Cercopithecus_neglectus 1.176091 1.204120 0.845098 0.778151 Cercopithecus_pogonias Chlorocebus_aethiops Colobus_angolensis_palliatus Colobus_guereza 1.230449 1.322219 0.903090 0.845098 Colobus_polykomos Erythrocebus_patas Lophocebus_albigena Macaca_arctoides 0.903090 1.079181 1.079181 1.230449 Macaca_fascicularis Macaca_mulatta Macaca_nemestrina Macaca_radiata 1.176091 1.204120 1.371068 1.397940 Macaca_silenus Macaca_sylvanus Mandrillus_leucophaeusMandrillus_sphinx 1.322219 1.041393 1.041393 1.00 Miopithecus_talapoin Nasalis_larvatus Papio_anubis Papio_cynocephalus 1.230449 0.698970 1.204120 1.00 Papio_hamadryas Papio_papio Piliocolobus_badius Presbytis_comata 0.477121 1.176091 1.079181 1.041393 Procolobus_verus Semnopithecus_entellus
Re: [R-sig-phylo] convergence issues with hansen model
Hi Rafael. I don't want to speak on behalf of the authors who are also on this list, but OUwie can read in modified phylo objects created by the phytools functions read.simmap make.simmap. All the best, Liam -- Liam J. Revell University of Massachusetts Boston web: http://faculty.umb.edu/liam.revell/ email: liam.rev...@umb.edu blog: http://phytools.blogspot.com On 3/13/2012 4:35 PM, Rafael Maia wrote: Speaking of OUwie, the paper just came out today: Jeremy M. Beaulieu, Dwueng-Chwuan Jhwueng, Carl Boettiger Brian C. O’Meara. MODELING STABILIZING SELECTION: EXPANDING THE ORNSTEIN-UHLENBECK MODEL OF ADAPTIVE EVOLUTION. Evolution, Accepted Article, DOI: 10./j.1558-5646.2012.01619.x (link: http://onlinelibrary.wiley.com/doi/10./j.1558-5646.2012.01619.x/abstract ) Has anyone tried it out yet? Is there a way of converting SIMMAP trees to the format to be used with the OUwie function? Thanks! Abraços, Rafael Maia --- webpage: http://gozips.uakron.edu/~rm72 A little learning is a dangerous thing; drink deep, or taste not the Pierian spring. (A. Pope) Graduate Student - Integrated Bioscience University of Akron http://gozips.uakron.edu/~shawkey/ On Mar 12, 2012, at 2:39 PM, Liam J. Revell wrote: You might want to try the OUWie package by Beaulieu O'Meara (http://cran.r-project.org/web/packages/OUwie/index.html). I have not tried it yet, but it promises to do multi-optimum OU model fitting as well. All the best, Liam -- Liam J. Revell University of Massachusetts Boston web: http://faculty.umb.edu/liam.revell/ email: liam.rev...@umb.edu mailto:liam.rev...@umb.edu blog: http://phytools.blogspot.com On 3/12/2012 2:31 PM, Pascal Title wrote: Hi all I have a dataset of 5 PC axes for a phylogeny with 52 tips and I would like to get the variance-covariance matrix under a global OU model of evolution. I find that OU is strongly favored over BM, based on fitContinuous in GEIGER. However, I am having issues with convergence when trying to fit a hansen model to my data. What can I do to get around this problem? Alternatively, is there another way to get at the multivariate VCV matrix other than with the OUCH package? The error is: *unsuccessful convergence, code 1, see documentation for `optim'* *Warning message:* *In hansen(tree = ot, data = otd[varnames], regimes = otd[regimes], :* * unsuccessful convergence* I have posted a small file that contains the following R script along with the data herehttp://dl.dropbox.com/u/34644229/OUhansen.zip. Any advice would be greatly appreciated! Thanks! require(ouch) require(ape) tree-read.nexus('tree.nex') data-read.csv('data.csv') rownames(data)-data$X data$X-NULL tree head(data) colnames(data)-varnames #for hansen() ot-ape2ouch(tree) otd-as(ot,data.frame) data$labels-rownames(data) otd-merge(otd,data,by=labels,all=TRUE) rownames(otd)-otd$nodes ot-with(otd,ouchtree(nodes=nodes,ancestors=ancestors,times=times,labels=labels)) otd$regimes-as.factor(global) h1-hansen(tree=ot,data=otd[varnames],regimes=otd[regimes],sqrt.alpha=c(1,0,0,0,0,1,0,0,0,1,0,0,1,0,1),sigma=c(1,0,0,0,0,1,0,0,0,1,0,0,1,0,1)) ___ R-sig-phylo mailing list R-sig-phylo@r-project.org mailto:R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Re: [R-sig-phylo] estimate ancestral states of specific internal nodes
Hi Annemarie. If you know the tip names for all the species descended from the target node in each tree in your sample, you could use findMRCA in the phytools package. findMRCA uses ape's mrca function to get the MRCA of a set of tips. So, for instance, if your target node is the MRCA of taxa labeled t1, t3, and t8, with data in the vector x, you could do the following: sp-c(t1,t3,t8) y-ace(x,tree) target.state-y$ace[as.character(findMRCA(tree,sp))] Of course, looping this over a set of trees is simple, for instance, using sapply: sp-c(t1,t3,t8) target.state-sapply(trees,function(a,x,sp) ace(x,a)$ace[as.character(findMRCA(a,sp))],x=x,sp=sp) Or, of course, you could just as easily use a simple for loop. Perhaps this helps? All the best, Liam -- Liam J. Revell University of Massachusetts Boston web: http://faculty.umb.edu/liam.revell/ email: liam.rev...@umb.edu blog: http://phytools.blogspot.com On 5/8/2012 7:44 PM, Annemarie Verkerk wrote: Dear r-sig-phylo users, I have a question regarding the retrieval of ancestral states of internal nodes in a sample of trees. I am doing ancestral state estimations on a large sample of trees (using ace() in ape), but a few groups of languages are present in all my trees. I would like to know how to easily retrieve the ancestral state of the internal node leading to such a group of languages, given that I know the numbers of the edges leading to those language tips, but the number of the internal edge leading to those languages (and thus defining the grouping / clade) will not be the same in all of the trees. This would be very easy to do in BayesTraits using 'addnode', but I really want to use the 'ML' version of ace(), which is not included in BayesTraits. Does anyone have an idea? (Sorry if this has been asked before, I could not find anything in the archives.) Many thanks, Annemarie ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Re: [R-sig-phylo] more problems with branch names
Looks like this may be because your row names in your data frame have spaces, while your tree tip labels do not. Try: rownames(example)-sub( ,,rownames(example)) example-example[tree$tip.label,] Help? - Liam -- Liam J. Revell University of Massachusetts Boston web: http://faculty.umb.edu/liam.revell/ email: liam.rev...@umb.edu blog: http://phytools.blogspot.com On 5/9/2012 5:13 PM, Agus Camacho wrote: Dear all, I had the same problem as T. Gamble and tried the advice from L. Revell and Rich over my own tree (attached), all of the hints produced a df with Nas. Also, I cant see any difference between rownames on my matrix and tree tip labels... Could anyone give me a hint about what i am doing wrong here? Thanks in advance, Agus PS: There it goes what I did: First from Liam's hint: example- read.csv(example.csv,header=TRUE,row.names=1) example vara varb C. leiolepis 1a C. nicterus 2b C. sinebrachiatus3c S. catimbau 4a N. ablephara 5b P. erythrocercus 6c P. tetradactylus 7a V. rubricauda8b V. rubricaudavac 9c M. maximiliani 10a P. paeminosus 11b tree$tip.label [1]C.leiolepis C.nicterusC.sinebrachiatus S.catimbau [5]N.ablephara P.erythrocercus P.tetradactylus V.rubricauda [9]V.rubricaudavac M.maximiliani P.paeminosus example-example[tree$tip.label,] example vara varb NA NANA NA.1NANA NA.2NANA NA.3NANA NA.4NANA NA.5NANA NA.6NANA NA.7NANA NA.8NANA NA.9NANA NA.10 NANA Now, Rich's hint: match(tree$tip.label, rownames(example)) - match match [1] NA NA NA NA NA NA NA NA NA NA NA -- Agustín Camacho Guerrero. Doutorando em Zoologia. Laboratório de Herpetologia, Departamento de Zoologia, Instituto de Biociências, USP. Rua do Matão, trav. 14, nº 321, Cidade Universitária, São Paulo - SP, CEP: 05508-090, Brasil. ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Re: [R-sig-phylo] Non-parametric alternative to phylogenetic ANOVA?
Hi Karin. GLS with x as a factor is a generalized ANOVA which assumes [in the case of gls(...,correlation=corBrownian)] that the residual error in the ANOVA model has evolved by Brownian evolution. If you read your data into data frame Z with row names as species names, for instance: Z-read.table(filename,header=T,row.names=1) tree-read.tree(treefile) and your column name for the factor is x the column name for the continuous response variable is y, then you should just be able to do: fit-gls(y~x,data=Z,correlation=corBrownian(1,tree)) You can then perform various posthoc analyses from the gls object that is produced. For instance summary(fit) anova(fit) residuals(fit) As pointed out by Alejandro, you should check for normality of the residuals in residuals(fit) - not the normality of y before analysis. summary(fit) will also give you parameter estimated (fitted means for each factor) and standard errors. These can be used to conduct posthoc comparison of means using t-tests in the standard way. I hope this helps. All the best, Liam -- Liam J. Revell University of Massachusetts Boston web: http://faculty.umb.edu/liam.revell/ email: liam.rev...@umb.edu blog: http://phytools.blogspot.com On 5/30/2012 10:46 AM, Karin Schneeberger wrote: Hi Alejandro Thank you for the very quick answer. I tried PGLS before, but then was told that GLS is not suitable for multistate categorical variables that can not be ranked (otherwise I would treat them as continuous). Also, with GLS it's as far as I understood not possible to state statistically whether certain groups are greater than others. But I am new into this kind of analysis and am very happy for any help and explanation, as I might be totally wrong. Cheers, Karin Von: Alejandro Gonzalezalejandro.gonza...@ebd.csic.es CC: r-sig-phylo@r-project.orgr-sig-phylo@r-project.org Gesendet: 16:26 Mittwoch, 30.Mai 2012 Betreff: Re: [R-sig-phylo] Non-parametric alternative to phylogenetic ANOVA? Hi Karin, You could use a gls method and look at the distribution of your residuals. It is the residuals which must be normally distributed, which can be checked using diagnostic plots such as a histogram or qq-plot of the residuals of your model. Cheers Alejandro On 30, May 2012, at 4:12 PM, Karin Schneeberger wrote: Dear all I'm trying to compare one trait across three (unordered categorical) groups including 25 species (let's say for example basal metabolic rate of aquatic, terrestrial and aerial mammals). If the data would be normally distributed, I would simply use a phylogenetic ANOVA including a post-hoc test on means accounting for the phylogeny, as provided by the package phytools. However, my tip-data are far away from being normally distributed, and transformations only lead to unsatisfying improvements (e.g Shapiro-Wilk test returns a p-Value of 0.08 instead of 0.03 as before transformation). So I am not convinced that a phylogenetic ANOVA is the right approach for dealing with these sort of data. Is there any non-parametric approach to compare groups across phylogeny that also returns which groups differ from each other? Your sincerely, Karin [[alternative HTML version deleted]] ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo __ Alejandro Gonzalez Voyer Post-doc Estación Biológica de Doñana Consejo Superior de Investigaciones Científicas (CSIC) Av Américo Vespucio s/n 41092 Sevilla Spain Tel: + 34 - 954 466700, ext 1749 E-mail: alejandro.gonza...@ebd.csic.es Web site (Under construction): Personal page: http://consevol.org/members/alejandro.html Group page: http://consevol.org/index.html For PDF copies of papers see: http://csic.academia.edu/AlejandroGonzalezVoyer [[alternative HTML version deleted]] ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Re: [R-sig-phylo] Question about Kcalc with 1 data points per species
Hi Jonathan. The thing to do in this case is to estimate K with within-species variability. This is described in Ives et al. (2007; Syst. Biol.) and implemented in the phytools function phylosig. This will give an unbiased estimate of K. (K estimated when within-species variability is ignored is downwardly biased.) To do this, you need the species means for each species and a vector of the standard error of the means. If we cannot estimate the standard error of the mean for some species (because n=1), then we can just use the mean variance. To get the means and standard errors (assuming your raw data is in a vector, x, with species names), we can just do the following: # get the mean by species temp-aggregate(x,by=list(names(x)),mean) xbar-temp[,2]; names(xbar)-temp[,1] # get the variance by species temp-aggregate(x,by=list(names(x)),var) xvar-temp[,2]; names(xvar)-temp[,1] # replace NA with mean variance xvar[is.na(xvar)]-mean(xvar,na.rm=TRUE) # get the N per species n-as.vector(table(names(y))) # compute the standard errors se-sqrt(xvar/n) # compute K K-phylosig(tree,xbar,se=se) Hopefully that works for you. All the best, Liam -- Liam J. Revell University of Massachusetts Boston web: http://faculty.umb.edu/liam.revell/ email: liam.rev...@umb.edu blog: http://phytools.blogspot.com On 6/6/2012 6:12 PM, Jonathan Benstead wrote: Dear r-sig-phylo members - Colleagues and I came across an old post about calculating the Blomberg K statistic in studies where there is more than one data point for some species, by using a tree that includes zero-branch lengths for those multiple studies. This seems ideal for our data, which has several species with up to 14 data points for trait values. We have prepared a tree with the necessary zero-branch lengths and our tip labels match our row labels. However, when I run the code I get the following error message: Error in Initialize.corSymm(X[[1L]], ...) : Initial values for corSymm must be between -1 and 1 I found two references to this problem among old posts, but nothing that helped solve the problem, except that it might be something to do with the tree. As someone else has pointed out, the problem goes away if we change the zero branch lengths to a positive number, but we don't want to give these branches an arbitrary length. Is there another way to do this? Here's our code: library(ape) library(picante) Fish = read.table(Fish.csv, header = TRUE, sep = ,) AuthP2-as.matrix(as.matrix(Fish)[,1]) tree1 = read.nexus(consensus_constraint_polytomies.nex) tree2 = drop.tip(tree1, Erpetoichthys_calabaricus_AY442348_) names(AuthP2)-tree2$tip.label Kcalc(AuthP2[tree2$tip.label], tree2) Any help getting this analysis to work would be greatly appreciated. Jon Jon Benstead Associate Professor Aquatic Biology Program University of Alabama, Box 870206 1124 Bevill Building, 201 7th Ave. Tuscaloosa, Alabama 35487-0206 Lab homepage: http://bama.ua.edu/~jbenstead Iceland blog: http://icelandstreams.blogspot.com/ Tel: 205-348-9034 Fax: 205-348-1403 ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Re: [R-sig-phylo] Error estimating RMA in phytools
Hi Oscar. My educated guess after some investigating is this is because the objects Chem2mean and Morpho2mean are data frames not matrices; and thus Chem2mean[,1] does not contain names. Try first doing: Chem2mean-as.matrix(Chem2mean), and the same with Morpho2mean, and then repeat the analysis. You might also get this error (or one quite similar) if you tree has zero-length terminal edges or y=k*x for non-zero scalar constant k (i.e., r(x,y)==1 or -1). All the best, Liam -- Liam J. Revell University of Massachusetts Boston web: http://faculty.umb.edu/liam.revell/ email: liam.rev...@umb.edu blog: http://phytools.blogspot.com On 6/22/2012 12:17 PM, Oscar Valverde wrote: Hi I am working on the correlation among traits of 36 plants. I am using the function phyl.RMA to estimate relationship between pairs of traits but I keep having a error message CNvrSRLphylo-phyl.RMA(Chem2mean[,1],Morpho2mean[,1],treeARP,method=lambda) Error in solve.default(kronecker(R, C)) : system is computationally singular: reciprocal condition number = 0 I tried running the code provided by phytools for this function and works fine. So it must be something about my tree that is not working. Thanks in advance for the help On Fri, Apr 20, 2012 at 12:19 PM, Jarrod Hadfieldj.hadfi...@ed.ac.uk wrote: Hi, The warning: 1: glm.fit: algorithm did not converge is from a standard glm that MCMCglmm uses to obtain semi-reasonable starting values. For a three (I think this is correct for your data?) category response the starting values are obtained from 2 binomial glms (the presence of category 2 and the presence of category 3). The fact that a simple glm did not converge indicates a problem. You seem to suggest the warning only appears with phylogenetic models, which seems odd. Is this correct? If so, try the argument start=list(QUASI=FALSE) in MCMCglmm and report back. If not, I would try and diagnose the problem using a simple binomial glm for each of the two categories in order to pinpoint the problem. The most likely problem is complete separation whereby some categories are not represented in some environments. Is this possible? With completely separated data MCMCglmm (and other software) is likely to give spurious results unless something is done. One possibility is to fit a more informative prior on the fixed effects which is approximately flat on the probability scale (See the last half of section 2.6 in the CourseNotes). Cheers, Jarrod Quoting Rafael Rubio de Casasr...@nescent.org on Tue, 17 Apr 2012 17:15:17 -0400: Dear R-Sig-Phylo users, I am trying to use MCMCglmm to analyze the evolution of a discrete character and the influence of the environment on it, but I keep on having problems. All my mixed models give crazy results, with chains that do not converge. I always get teh following error message, which I believe is probably at the root of the problem Warning messages: 1: glm.fit: algorithm did not converge However, when I fit a fixed effects model (no phylogenetic correction), everything seems to go fine, and it is when I incorporate the phylogeny that things fall apart. I keep on getting the warning message, regardless of the priors (although there might be something to improve there). I was hoping somebody could give some advice. So here is a detailed description of what I am doing: My data consists of a trait with three levels, a set of environmental parameters and a phylogeny that describes the evolutionary relationships among the different taxa. Let the data be stored in a dataframe df, including taxa names as taxa, the trait of interest as a factor with three states char, and the environmental parameter a continuous variable env. The phylogeny is an ultrametric tree phy. Below I include a simulation that creates a similar dataset but that for some reason is not problematic. My actual dataset is very big and somewhat messy. If someone is interested I can send it. The code I'm using is as follows #Fixed effects model: #Prior definition k- length(levels(data$char)) I- diag( k-1 ) J- matrix( rep(1, (k-1)^2), c(k-1, k-1)) IJ- (1/3) * (diag(k-1) + J) prior = list(R = list(V = IJ, fix = 1)) myMCMC- MCMCglmm( char ~ trait + env:trait -1, rcov = ~us(trait):units, data = df, family=categorical, prior=prior ) #Model using the phylogeny as a random factor #Prior definition . (There might be some problems here...) Prior.phyl = list(R = list(V = IJ, fix = 1),G = list( G1 = list(V = IJ, n = k-1 ) ) ) Ainv-inverseA(phy)$Ainv df$species=df$taxa myMCMC.phyl- MCMCglmm( ch ~ trait + env:trait -1, random=~us(trait):species, rcov = ~us(trait):units, ginverse = list(species=Ainv), data = df, family=categorical, prior=prior.phyl) Thanks in advance, Rafa ### Simulated data
Re: [R-sig-phylo] Phylogenetic Canonical Correlations
Hi Jon. phytools (http://cran.r-project.org/package=phytools) has a function, phyl.cca, for phylogenetic canonical correlations. It returns the coefficients scores (in the species space - so, with phylogenetic correlation), correlations, and p-values. You can also just use pic and cancor to get the same coefficients, as follows: # simulate data tree-rtree(50) X-replicate(5,fastBM(tree)) Y-replicate(6,fastBM(tree)) # use phyl.cca result1-phyl.cca(tree,X,Y) # use pic cancor picX-apply(X,2,pic,phy=tree) picY-apply(Y,2,pic,phy=tree) result2-cancor(picX,picY,xcenter=F,ycenter=F) Note that the coefficients from result1 and result2 may not be equal, but scale by a constant (this will not change the correlation between corresponding canonical variables). # e.g. result1$xcoef/result2$xcoef result1$ycoef/result2$ycoef[,1:5] # should produce matrices in which each column of each matrix # is a scalar constant - Liam -- Liam J. Revell, Assistant Professor of Biology University of Massachusetts Boston web: http://faculty.umb.edu/liam.revell/ email: liam.rev...@umb.edu blog: http://phytools.blogspot.com On 7/13/2012 1:32 PM, Jonathan Mitchell wrote: Hello all, I'm trying to perform a canonical correlations analysis that takes phylogenetic non-independence into account properly (a la what Revall 2009 did for PCA). To this effect, I've written a script to do canonical correlates in R based entirely on SVD instead of QR decomp. (because I understand SVD better) that recapitulates the cancor() function ( http://nr.com/whp/notes/CanonCorrBySVD.pdf). The way I'm currently approaching this is, I'm centering each column of X and Y on their phylogenetic mean instead of their numeric mean, and multiplying the centered matrices by the inverse of the vcv. Is this appropriate/correct? Is there a better way? Simplified example code posted below: tree - rtree(10) X - replicate(5, rnorm(10)) Y - replicate(6, rnorm(10)) rownames(X) - rownames(Y) - tree$tip.label C - vcv(tree) inC - solve(C) Ones - rep(1, nrow(X)) PMx - t(solve(t(Ones) %*% inC %*% Ones) %*% t(Ones) %*%inC %*% X) PMy - t(solve(t(Ones) %*% inC %*% Ones) %*% t(Ones) %*%inC %*% Y) pcX - invC %*% (X - Ones %*% t(PMx)) pcY - invC %*% (Y - Ones %*% t(PMy)) #SVD of X, Y and Ux'Uy Xs - svd(pcX) Ys - svd(pcY) uTu - t(Xs$u) %*% Ys$u uTus - svd(uTu) #Canonical coefficients for X A - Xs$v %*% solve(diag(Xs$d)) %*% uTus$u #Canonical coefficients for Y B - Ys$v %*% solve(diag(Ys$d)) %*% uTus$v -- Jon _ Jonathan S. Mitchell http://home.uchicago.edu/~mitchelljs/ PhD Student Committee on Evolutionary Biology The University of Chicago 1025 57th Str, Culver Hall 402 Chicago, IL 60637 Geology Department The Field Museum of Natural History 1400 S. Lake Shore Dr. Chicago, IL 60605 [[alternative HTML version deleted]] ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Re: [R-sig-phylo] Within-species error in PGLS (or similar)
Hi Matt, David, and Jay. Yes, the general approach described in Ives et al. (2007; Syst. Biol.) is implemented for different univariate tests and methods, including model fitting, in phytools (as well as in geiger); however the regression model of Ives et al. has not yet been implemented in R, so far as I know. I have made an attempt to write a function for this method, pgls.Ives, and it is now posted to the phytools page (link to code: http://faculty.umb.edu/liam.revell/phytools/pgls.Ives/v0.1/pgls.Ives.R). This version only allows bivariate (y=b0+b1*x+e) regression and a simple Brownian error structure, but does incorporate sampling error in x y following Ives et al. If the reported sampling error is zero, then the function performs standard PGLS yields the same estimators as gls(...,correlation=corBrownian(...)). Note that I just did this today and it has been subjected to very limited testing and it may contain bugs or errors as the math of Ives et al. was not completely straightforward to me in a couple of places (here I tried to use common sense). source(pgls.Ives.R) args(pgls.Ives) function (tree, X, y, Vx, Vy, Cxy) tree is a phylo object; X is a *vector* containing x, with names(X) containing the tree tip labels; y is a vector of y with names(y) as before; Vx, Vy, Cxy are vectors with the sampling variances (squared standard errors) or covariances for X, y, and between X and y with names(...) as before. If anyone would like to compare to Ives Garland's MATLAB code I would love to hear the result as I only hope I got this right. All the best, Liam Liam J. Revell, Assistant Professor of Biology University of Massachusetts Boston web: http://faculty.umb.edu/liam.revell/ email: liam.rev...@umb.edu blog: http://phytools.blogspot.com On 7/16/2012 7:24 PM, Matt Pennell wrote: David and Jay, I am not sure if Liam and Graham's was designed for phylogenetic regression per se. Tony Ives has done exactly this but his code is only in Matlab as far as I understand. An alternative would be to use RMA approach for this which Liam has created (but note Simon Blomberg's warning about this in the comments section): http://phytools.blogspot.com/2011/04/phylogenetic-rma-regression.html Another (possibly less problematic) option is to use Joe Felsenstein's approach which is in the current implementation of the progarm contrast in the phylip package. This allows one to do independent contrasts on traits with multiple measurements at the species level. Comparative Methods with Sampling Error and Within�Species Variation: Contrasts Revisited and Revised. Joseph Felsenstein The American Naturalist , Vol. 171, No. 6 (June 2008), pp. 713-725 Personally I would do this with the R package MCMCglmm as it allows for greater flexibility in the distribution of the tip values but it is perhaps more burdensome to run. HADFIELD, J. D. and NAKAGAWA, S. (2010), General quantitative genetic methods for comparative biology: phylogenies, taxonomies and multi-trait models for continuous and categorical characters. Journal of Evolutionary Biology, 23: 494–508. doi: 10./j.1420-9101.2009.01915.x hope this helps. cheers, matt On Mon, Jul 16, 2012 at 12:42 PM, Jay Fitzsimmons jayfitzsimm...@gmail.comwrote: Hello David, There is at least one package in R that incorporates intraspecific variation within a phylogenetic context. phytools. The method is described in: Revell, L. J. and R. G. Reynolds. In Press. A new Bayesian method for fitting evolutionary models to comparative data with intraspecific variation. Evolution. http://onlinelibrary.wiley.com/doi/10./j.1558-5646.2012.01645.x/abstract If others know better than me they should say so because while I plan to use phytools in the next month for the exact purpose David Winter describes, I haven't yet done so. Jay PS: thank you to everyone on this list - I am teaching myself R and have found the posts very helpful. On Wed, Jul 11, 2012 at 11:23 PM, David Winter david.win...@gmail.com wrote: Hi all, Sorry if this has been dealt with here before, I searched the list but couldn't find a thread quite the same as this. Are there methods in R for performing phylogenetic regression on datasets with many measurements per tip. We have data from many individuals that includes a measurement and several ecological variables that might be expected to predict that measurement. We also have a species-level phylogeny for these individuals, and we can happily assign each data point to a tip in this phylogeny. My first inclination was to crudely incorporate the evolutionary history of these individuals into our models by making a big polytomy for each species, but reading Ives et al 2007 (dx.doi.org/10.1080/10635150701313830) it seems there is a model of phylogenetic regression that include within-tip variation? Does anyone know if this is implemented in R? It's possible I'm entirely on the wrong track here, in which case I'm happy
Re: [R-sig-phylo] simulate traits evolution in correlated with body mass
Hi Andrew. For desired correlation r, size data x, and tree you could just do: library(phytools) y-r*x+sqrt(1-r^2)*fastBM(tree) This should give you the correlation r on average and the y|x should be Brownian. (If x is Brownian, then both y x y|x will be too.) - Liam Liam J. Revell, Assistant Professor of Biology University of Massachusetts Boston web: http://faculty.umb.edu/liam.revell/ email: liam.rev...@umb.edu blog: http://phytools.blogspot.com On 7/24/2012 2:25 PM, Andrew Barr wrote: Hello everyone, I have a tree, with body mass data for the tip taxa. I am interested in simulating the evolution of continuous traits by BM that are correlated with body mass. I want to use the body mass tip data that I have to inform the character simulation, so that my resulting simulated tip values are highly correlated with my actual body mass data. Can anyone suggest an appropriate way to simulate this? I know I can simulate correlated characters with geiger::sim.char(), but I don't know how I could include my actually body mass data using this method. Thanks very much, W. Andrew Barr PhD Candidate Dept of Anthropology University of Texas at Austin www.ancienteco.com ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Re: [R-sig-phylo] simulate traits evolution in correlated with body mass
Actually, modify my previous email. That only works for sig^2(x)=1.0. It should be: library(phytools) y-r*x+sqrt(1-r^2)*fastBM(tree,sig2=mean(pic(x,tree)^2)) - Liam Liam J. Revell, Assistant Professor of Biology University of Massachusetts Boston web: http://faculty.umb.edu/liam.revell/ email: liam.rev...@umb.edu blog: http://phytools.blogspot.com On 7/24/2012 4:47 PM, Liam J. Revell wrote: Hi Andrew. For desired correlation r, size data x, and tree you could just do: library(phytools) y-r*x+sqrt(1-r^2)*fastBM(tree) This should give you the correlation r on average and the y|x should be Brownian. (If x is Brownian, then both y x y|x will be too.) - Liam Liam J. Revell, Assistant Professor of Biology University of Massachusetts Boston web: http://faculty.umb.edu/liam.revell/ email: liam.rev...@umb.edu blog: http://phytools.blogspot.com On 7/24/2012 2:25 PM, Andrew Barr wrote: Hello everyone, I have a tree, with body mass data for the tip taxa. I am interested in simulating the evolution of continuous traits by BM that are correlated with body mass. I want to use the body mass tip data that I have to inform the character simulation, so that my resulting simulated tip values are highly correlated with my actual body mass data. Can anyone suggest an appropriate way to simulate this? I know I can simulate correlated characters with geiger::sim.char(), but I don't know how I could include my actually body mass data using this method. Thanks very much, W. Andrew Barr PhD Candidate Dept of Anthropology University of Texas at Austin www.ancienteco.com ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Re: [R-sig-phylo] find mrca of multiple taxa.
Hi Ben. Take a look at the function findMRCA in my package, phytools. I think this does what you want. Good luck. Liam Liam J. Revell University of Massachusetts Boston web: http://faculty.umb.edu/liam.revell/ email: liam.rev...@umb.edu blog: http://phytools.blogspot.com -Original Message- From: Ben Weinstein Sent: Wednesday, August 15, 2012 9:38 AM To: r-sig-phylo@r-project.org Cc: Shea Lambert; Antonin Machac Subject: [R-sig-phylo] find mrca of multiple taxa. Hello all, I'm trying to replace a clade of tips with a single tip, placed at the mrca of all taxa clade (similar Q to here https://stat.ethz.ch/pipermail/r-sig-phylo/2012-March/001895.html). Before i bind tree, I just need to a nice way to identify which node encompasses all descendants. I see many packages and functions, but i can't quite get the result i want. Forgive me if i've just missed this function somewhere, it seems hard to believe it hasn't been written explicitly. Clearly the package phylobase knows which node i want, when i subset by the tips.include library(phylobase) data(geospiza) nodeLabels(geospiza) - paste(N, nodeId(geospiza, internal), sep=) geotree - extractTree(geospiza) ## subset examples tips - c(difficilis, fortis, fuliginosa, fusca, olivacea, pallida, parvulus, scandens) subset(geotree, tips.include=tips) How can i get it to return which node it used, numbered from the original tree? My other attempt was to use expand.grid and mrca through an sapply command to brute force walk through every pair of taxa, and then take the maximum node number (since its farthest up the tree). This seems inelegant, and i'm not convinced it works 100% of the time. Thanks for your input. Ben Weinstein -- Ben Weinstein Graduate Student Ecology and Evolution Stony Brook University http://life.bio.sunysb.edu/~bweinste/index.html [[alternative HTML version deleted]] ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Re: [R-sig-phylo] Plot size
Hi Gwennael. I'm not totally sure I understand the problem, but this might help. You can use windows (in Windows) or dev.new to creating a plotting window with you desired dimensions. plot.phylo will then automatically rescale the branch lengths to fit. You can also control the font size from within plot.phylo. For example: library(ape) dev.new(width=2,height=10) plot(tree,cex=0.2,no.margin=T) savePlot(test1.pdf,type=pdf) # or, using phytools plotTree, which gets rid of some of the excess # whitespace library(phytools) dev.new(width=2,height=10) tree-compute.brlen(tree) # since your tree has no edge lengths plotTree(tree,fsize=0.2,ftype=i,lwd=0.4,pts=F) savePlot(test2.pdf,type=pdf) In addition, in an interactive session the tree (but not the labels) will automatically resize when you manually resize the plotting window. Is this at all helpful? All the best, Liam Liam J. Revell, Assistant Professor of Biology University of Massachusetts Boston web: http://faculty.umb.edu/liam.revell/ email: liam.rev...@umb.edu blog: http://phytools.blogspot.com On 8/28/2012 1:34 PM, Gwennaël Bataille wrote: Dear R-sig-phylo users, I would like to rescale a large tree, but I am not an expert of graphs in R... My labels overlap with each other. And I would like to find an option to put more space between each branch, but could not find one. So I tried to resize the labels (playing with cex parameter), but now the branches are far too long in comparison with the labels... So I would like to reduce also branch lengths : - I can define new values for edge.length but I want my taxa to be aligned. So, I prefer use.edge.length=F - I also thought about playing with x.lim parameter, but my graph is then truncated rather than resized... Any help would be appreciated. Thanks a lot in advance, Gwennaël ( gwennael.batai...@uclouvain.be mailto:gwennael.batai...@uclouvain.be ) PS: Here are the script and the data : #Question r-sig-phylo library(ape) t - (1,2,3),(18,19,20,21,22,23,24,25,26,27,28,29,30,31)),(4,5),(6,7,8),(9,10,11,12,13,14),(15,16,17),((32,33),34,((35,36,37),38,39))),40,42),41),(43,(44,45,46),47,((48,49),50))),51,52,53,54),(((55,((56,(57,58)),(59,60),61)),((62,63),64,65)),((66,67,68,69,70,71,72),73,(74,75,76,77,((78,(80,82)),(84,87)),((79,(81,83)),89)),(85,86)),88),(90,91,92))),((93,((94,95),((96,97,98,99,100,101,102,103,104,105,106),(118,(119,120),121,122,(123,124,125,126,127,128),129,(((130,131,132,133,134,135),136,138,139,((141,142),((143,144),(145,(146,147),137,140,(148,149,150,151),(152,153,154,155,156,157,158,159,160),((161,(164,165,166),((172,((173,177),(178,183)),179,180,181,182,184),174,175,176,185,186,187,188)),(167,168,169,170,171)),162,163))),((107,108),109,((110,111,112,113,114,115),(116,117),(((189,(((256,257,258,259,260,261,262),263,279),((268,269,271),270,272,(273,276),274,275,277,278)),(264,265,266,267)),(280,281,282,283,284,285),((286,287,288,289),(291,292! ),(293 ,294)),290)),342,343),(344,345,346)),349),(347,348))),295,((296,301),297,298,299,300,302,303,304,305,306,307,308,309,310,311,312,313,314,315,316,317,318,319,320,321,322,323,(((324,325,327,328),326),329),330,331,332,333,334,335,336,337,338,339,340,341)),(350,351,352,353,354,355,356,357,358,359,360,361,362,363)),364),(365,366),(367,368,369),370)),(((191,192,193),(((199,201),200),(202,(207,208))),(203,204)),(205,206)),209),((236,237),(242,(((243,247),244),(245,(246,248)),249))),240),238,239),241),250,253),255),254),(251,252),(((194,195,196),(197,198)),(210,211),212),213,(214,(215,216))),(228,(229,230))),((217,((218,221),(219,220),222,223,224,225,226,227)),231,234),233),232),235)),190)); tree - read.tree(text=t) plot(tree) #Plot with reduced size of the labels windows(width=15, height=10, pointsize=10, rescale=fixed) par(xpd=NA, cex = 0.3) #Reduce size of the labels plot(tree, y.lim=c(0,400)) #Play with x.lim plot(tree, y.lim=c(0,400), x.lim=200) #Define new edge length tree2 - tree$edge.length - rep.int(5,544) plot(tree2, y.lim=c(0,400), x.lim=200, use.edge.length=T) [[alternative HTML version deleted]] ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Re: [R-sig-phylo] Alter the branch lengths of a Tree
Hi Fernando. I'm not entirely sure what you're going for, but let me try to be of some help. The tree is stored in memory as what is called a list - basically, a set of objects that can be of different types. The branch lengths of the tree are stored in the list element edge.length. So, for a phylo object with the variable name tree, you can access the branch lengths of the tree using tree$edge.length. (If your tree does not have branch lengths, this will be NULL.) Consequently, you can assign any arbitrary values to the branch lengths of the tree by assigning values to tree$edge.length. For instance, to set all branch lengths to 1.0, you would just do: tree$edge.length-rep(1,nrow(tree$edge)) The order of tree$edge.length corresponds to the row order of tree$edge, so if you want to see which branches in tree$edge.length correspond to which branches on the tree, you can do something like the following (here, I use a simple random tree for illustration): set.seed(1) tree-rtree(10); tree$edge.length-rep(1,nrow(tree$edge)) X-tree$edge X[X[,2]%in%1:length(tree$tip),2]- tree$tip[X[X[,2]%in%1:length(tree$tip),2]] names(tree$edge.length)-paste(X[,1],X[,2],sep=,) plot(tree); nodelabels() tree$edge.length Now if, for instance, I want to make the branch length leading from node 13 to tip taxon t6 to (for instance) 3.0, I could just do: tree$edge.length[13,t6]-3 plot(tree); nodelabels() Of course, if we have some simple criterion on which we want to change our branch lengths, we can do that to. For instance, if we wanted to set all branch lengths with length 1 to 1.0 and all branch lengths with length 1 to 2.0, we could do it as follows: tree$edge.length-runif(n=nrow(tree$edge),min=0,max=2) tree$edge.length tree$edge.length[tree$edge.length1]-1 tree$edge.length[tree$edge.length1]-2 tree$edge.length plot(tree) I hope this is of some help. All the best, Liam Liam J. Revell, Assistant Professor of Biology University of Massachusetts Boston web: http://faculty.umb.edu/liam.revell/ email: liam.rev...@umb.edu blog: http://phytools.blogspot.com On 8/28/2012 4:15 PM, Fernando Sobral wrote: Hi, I am trying to alter the branch lengths of a phylogeny so that the smaller branch has size igual 1 and the larger branch is the proportional sum of the smaller branchs. For example, if the larger branch is equivalent to ten smaller branchs, it should have size igual to 10. I already tryed use some functions like compute.brlen(), compute.brtime() and rescaleTree(), but did not work. Any help would be greatly appreciated. Thanks, Fernando L. Sobral [[alternative HTML version deleted]] ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Re: [R-sig-phylo] tips to root brach length
Hi. Try: diag(vcv.phylo(tree)) nodeHeights in phytools will also give you a matrix with all the heights of internal and terminal nodes in the same order as tree$edge. - Liam Liam J. Revell University of Massachusetts Boston web: http://faculty.umb.edu/liam.revell/ email: liam.rev...@umb.edu blog: http://phytools.blogspot.com -Original Message- From: boyang zhe Sent: Monday, September 03, 2012 5:17 AM To: r-sig-phylo@r-project.org Subject: [R-sig-phylo] tips to root brach length Hi, everyone I am new to ape and R programming. I have an unrooted tree. I use root() function to reroot the tree by one outgroup. and then I try to extract root to tip distance. the tree$edge.length only store the brach length between each tip and corresponding internal node. So how I extract the distance between root and the tips? Thanks first! Boyang ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Re: [R-sig-phylo] variation in rates over time
Hi Jason. Matt is absolutely correct. You can do this with phytools. Say, for instance, you have an ultrametric phylogeny with branches in millions of years (tree) and data vector containing the trait values for species (x) and you want to test the hypothesis that the last 3.4 my has a different rate of evolution than the rest of the tree, you could do this as follows: library(phytools) # load phytools tree-make.era.map(tree,c(0,max(nodeHeights(tree))-3.4)) plotSimmap(tree,pts=F,lwd=3) # visualize fit-brownie.lite(tree,x) # fit model That's it. Good luck. Liam Liam J. Revell, Assistant Professor of Biology University of Massachusetts Boston web: http://faculty.umb.edu/liam.revell/ email: liam.rev...@umb.edu blog: http://phytools.blogspot.com On 9/17/2012 7:46 PM, Matt Pennell wrote: Jason, I think the best way to do this is with the approach of O'Meara et al. 2006 Evolution Brownie. Liam Revell has implemented this in R in his package phytools. You can modify the steps taken in this tutorial here http://phytools.blogspot.com/2011/07/running-brownielite-for-arbitrarily.html perhaps in conjunction with the function make.era.map() http://phytools.blogspot.com/2011/10/new-function-to-plot-eras-on-tree.html though I admittedly have not tried this myself. Perhaps Liam or someone else has a better explanation but I hope this is at least somewhat helpful. cheers, matt On Mon, Sep 17, 2012 at 7:33 PM, Jason S jas2...@yahoo.com wrote: Hello, I see that there are several interesting alternatives to test for different rates among clades. However, I was wondering if there is a method to test for varying rates over time. I'm aware of Pagel's delta and the EB model, but I was thinking more in terms of testing if there is a different rate for the entire tree after a specified point in time. For instance, if a snail predator colonizes an island 3.4 Mya, is there evidence for an increased rate of evolution in the prey after that point in time? Something like two lambdas, one for before and one for after that point in time. Thanks! Jason [[alternative HTML version deleted]] ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo [[alternative HTML version deleted]] ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Re: [R-sig-phylo] variation in rates over time, unexpected message when using Brownie.lite
Hi Agus. It looks like in your example the Hessian matrix of partial second derivatives (the curvature) at the optimum is numerically singular. I'm not sure what this means - it could be that the surface is very flat, or alternatively that it is highly correlated. Is the error fatal, or do you get a result? The Hessian matrix is only used to estimate standard errors of the fitted parameter values, so if the option to compute the variances from the Hessian could be turned off (not possible in the present version), presumably the analysis would run through. Another option is to use brownieREML in phytools or, as Brian pointed out yesterday, the OUwie package. brownieREML is a lighter REML version of brownie.lite, which does not compute the Hessian or the standard errors on parameter estimates. OUwie is a separate package that uses phytools mapped trees to do the analyses of brownie.lite in one of its functions. All the best, Liam Liam J. Revell, Assistant Professor of Biology University of Massachusetts Boston web: http://faculty.umb.edu/liam.revell/ email: liam.rev...@umb.edu blog: http://phytools.blogspot.com On 9/18/2012 4:08 PM, Agus Camacho wrote: Dear all, I tried to test something similar to Jason, but relating the change in rates to the acquisition of a trait, instead to a specific date. For that, I used a phylogeny with ten taxa, without outgroup. I used exactly the same script gently proposed by Liam but got the following error message: Error in solve.default(model1$hessian) :Lapack routine dgesv: system is exactly singular I've been looking through the internet but was not able to identify what causes this error. Any hint about that? BTW, anybody knows how to directly answer a message found in the digest of the mail list, without leaving the thread? Below, I pasted the complete thread dealing with this topic. Cheers, Agus Message: 9 Date: Tue, 18 Sep 2012 00:13:00 -0400 From: Brian O'Meara bome...@utk.edu To: Jason S jas2...@yahoo.com Cc: R-sig-phylo@r-project.org R-sig-phylo@r-project.org Subject: Re: [R-sig-phylo] variation in rates over time Message-ID: cakywhkq-jjroqzdvfutphi4xopxam+aaacu64nldb3inxoz...@mail.gmail.com Content-Type: text/plain I agree with the suggestions so far. I just wanted to point out a few more alternatives: You could use the geiger package to estimate the best scaling for the tworatetree transformation to do this (should be equivalent to the earlier solutions, though it would require running optimization). You could also use the OUwie package with a tree that has been given simmap mappings using phytools. The advantage of this is that you could evaluate Brownian models but you could also look at various Ornstein-Uhlenbeck models (though note that while there is information about different Brownian rates before and after a time slice, info about different alphas (strength of pull parameter) and thetas (the attractor, aka optimal value) is rapidly (but not immediately) lost under an OU process). For completeness, especially for citations, note that O'Meara et al. (2006) and Thomas et al. (2006) independently arrived at essentially the same method, so it is worth reading both papers. Best, Brian ___ Brian O'Meara Assistant Professor Dept. of Ecology Evolutionary Biology U. of Tennessee, Knoxville http://www.brianomeara.info Students wanted: Applications due Dec. 15, annually Postdoc collaborators wanted: Check NIMBioS' website Calendar: http://www.brianomeara.info/calendars/omeara On Mon, Sep 17, 2012 at 9:11 PM, Jason S jas2...@yahoo.com wrote: Thanks, guys. That's exactly what I needed. From: Liam J. Revell liam.rev...@umb.edu To: Matt Pennell mwpenn...@gmail.com -project.org Sent: Monday, September 17, 2012 9:22 PM Subject: Re: [R-sig-phylo] variation in rates over time Hi Jason. Matt is absolutely correct. You can do this with phytools. Say, for instance, you have an ultrametric phylogeny with branches in millions of years (tree) and data vector containing the trait values for species (x) and you want to test the hypothesis that the last 3.4 my has a different rate of evolution than the rest of the tree, you could do this as follows: library(phytools) # load phytools tree-make.era.map(tree,c(0,max(nodeHeights(tree))-3.4)) plotSimmap(tree,pts=F,lwd=3) # visualize fit-brownie.lite(tree,x) # fit model That's it. Good luck. Liam Liam J. Revell, Assistant Professor of Biology University of Massachusetts Boston web: http://faculty.umb.edu/liam.revell/ email: liam.rev...@umb.edu blog: http://phytools.blogspot.com On 9/17/2012 7:46 PM, Matt Pennell wrote: Jason, I think the best way to do this is with the approach of O'Meara et al. 2006 Evolution Brownie. Liam Revell has implemented this in R in his package phytools. You can modify the steps taken in this tutorial here http://phytools.blogspot.com/2011/07/running-brownielite
Re: [R-sig-phylo] phy.manova from Geiger package gives an error of different variable length with variables of equal length
Hi Agus. I think the problem is that c() concatenates your data for the continuous dependent variables in your model into one long vector. Try instead: table-as.matrix(table) Y-cbind(table[,2],table[,3]) x-as.factor(table[,1]) phy.manova(tree,Y,x) Good luck. - Liam Liam J. Revell, Assistant Professor of Biology University of Massachusetts Boston web: http://faculty.umb.edu/liam.revell/ email: liam.rev...@umb.edu blog: http://phytools.blogspot.com On 9/25/2012 11:54 AM, Agus Camacho wrote: Hope not being abusing of this list. Possibly, I missed something. I tried both with matrix and dataframe and reached to the same problem: phy.manova(tree, c(table$option,table$difoption), table$morph, data.names=NULL, nsim=1000, test=Wilks)Warning: no tip labels, order assumed to be the same as in the treeError in model.frame.default(formula = as.matrix(td$data) ~ group, drop.unused.levels = TRUE) phy.manova(tree, c(table[,option],table[,difoption]), table[,morph], data.names=NULL, nsim=1000, test=Wilks)Error in model.frame.default(formula = as.matrix(td$data) ~ group, drop.unused.levels = TRUE) : variable lengths differ (found for 'group') length(table[,morph]) [1] 10 length(table[,option])[1] 10 length(table[,difoption]) [1] 10 theregoes the data: morph option difoption C._leiolepis 2 2.916667 -0.167 C._nicterus 2 2.354167 -0.208 C._sinebrachiatus 2 2.464286 0.2857143 S._catimbau 2 2.411765 0.2352941 N._ablephara 2 2.795455 0.1363636 P._erythrocercus 1 1.545455 1.0909091 P._tetradactylus 1 1.74 1.080 V._rubricauda 1 1.462500 0.725 M._maximiliani1 1.269231 0.3846154 P._paeminosus 1 1.307692 0.6153846 Morph is of class factor. For the developers, thanks for making all these analyses possible. Cheers, Agus ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Re: [R-sig-phylo] breaking up phylogeny figure
Hi Ben. You've probably found a way to do this already (since I see that your original message was sent in early October), most likely outside of R, but I just developed and posted an R function for this which I will add to the next update of phytools. I have described this function, along with links to the code, on my blog: http://blog.phytools.org/2012/11/function-to-break-up-plotted-tree-into.html All the best, Liam Liam J. Revell, Assistant Professor of Biology University of Massachusetts Boston web: http://faculty.umb.edu/liam.revell/ email: liam.rev...@umb.edu blog: http://phytools.blogspot.com On 10/3/2012 12:45 PM, Ben Weinstein wrote: Hi all, I'm wondering if there is a way to plot a phylogeny in R that is broken up and displayed side by side to condense space. I've made a backbone in R that i'd like to make into a figure, but it is too tall. If this isn't the write setup, could someone suggest what program (and format for me to write.tree) from R. I am using windows, in terms of potential third part programs. Thanks, ben ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Re: [R-sig-phylo] Phylogenetic signal and PGLS
Hi Antigoni. The other responders are correct, but just to be a little more concrete: # using ape nlme library(ape); library(nlme) # assuming your data are contained in named vectors x y y-y[names(x)] fit-gls(y~x,data.frame(x,y),correlation=corPagel(1,tree,fixed=FALSE),method=ML) # assuming your data are in data frame X, with column names var1, # var2, etc.; for example: fit-gls(var1~var2+var3,X,correlation=corPagel(1,tree,fixed=FALSE),method=ML) # using caper library(caper) # assuming your data are in named vectors x y y-y[names(x)] X-data.frame(x,y,Species=names(x)) fit-pgls(y~x,comparative.data(tree,X,Species),lambda=ML) # using phytools library(phytools) # assuming your data are in named vectors x y fit-phyl.resid(tree,x,y,method=lambda) Although I am the author of phytools, I would say that the first two options should be preferred in general because they can be used with generic functions such as summary residuals. (With the *possible* exception that phyl.resid seems to be a little bit faster on large trees. pgls in caper, for instance, took more than 4 min. to run a tree of 1000 tips on my computer.) I hope this helps. All the best, Liam Liam J. Revell, Assistant Professor of Biology University of Massachusetts Boston web: http://faculty.umb.edu/liam.revell/ email: liam.rev...@umb.edu blog: http://blog.phytools.org On 11/28/2012 12:02 PM, Antigoni Kaliontzopoulou wrote: Hello everyone, I am trying to implement Liam Revell's suggestion on the evaluation of Pagel's lambda simultaneous to fitting PGLS to minimize the effects of wrong model selection (OLS vs. PGLS) on species data (i.e. Revell 2010, Methods in Ecology and Evolution 1: 319-329). Does anyone know if this kind of simultaneous optimization of lambda and PGLS parameters is presently implemented in R? The obvious place would be phytools, but I could not find anything similar in that package. I would be grateful if you can provide any suggestions. Antigoni ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Re: [R-sig-phylo] truncated downwards plot.phylo and plot margin
Hi Zech. Regarding point (1), this is indeed a bug in plot.phylo, however there is a work-around. Try the following: library(ape) data(bird.orders) plot(bird.orders,direction=downwards) axis(2) plot(bird.orders,direction=downwards,y.lim=c(-15,30)) Regarding point (2), plot.phylo(...,no.margin=TRUE) *does* plot without margins; however there is some space within a plotted graph between the (invisible) plot axes and plotted points and lines. If we want our edges to butt right up against the plot window, we can also accomplish this with the arguments y.lim and x.lim. For instance, I found via trial error that: plot(bird.orders,x.lim=c(1.3,35.5),y.lim=c(1.5,22.4),no.margin=T) got me pretty darn close. All the best, Liam Liam J. Revell, Assistant Professor of Biology University of Massachusetts Boston web: http://faculty.umb.edu/liam.revell/ email: liam.rev...@umb.edu blog: http://blog.phytools.org On 3/5/2013 2:32 AM, Not To Miss wrote: Hi, When plotting the phylo tree downwards, the tip labels are truncated: library(ape) data(bird.orders) plot(bird.orders, direction='downwards') I found the problem on the website: http://www.mail-archive.com/r-sig-phylo@r-project.org/msg00298.html. It seems still not solved? It's kind of annoying. Another annoying thing is that, when setting the margins to zero, the plot.phylo still plot some margins around the tree, which is totally inconsistent to the general plotting convention. Is it possible to be changed? Thanks, Zech [[alternative HTML version deleted]] ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/ ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/
Re: [R-sig-phylo] testing for correlates of rates of evolution
Hi John Matt. What about the admittedly ad hoc approach of computing the correlation between the states at ancestral nodes for x the squared contrasts for corresponding nodes for y? Then you can generate a null distribution for the test statistic (say, a Pearson or Spearman rank correlation) by simulation. This seems to give reasonable type I error when the null is correct, and when I simulate under the alternative (i.e., the rate of Brownian evolution along a branch depends on the state at the originating node) it sometimes is significant. Here's a function that does what I've described (I think - please check it carefully!). It needs phytools and all dependencies. ratebystate-function(tree,x,y,nsim=100,method=c(pearson,spearman)){ method-method[1] if(!is.binary.tree(tree)) tree-multi2di(tree) V-phyl.vcv(cbind(x,y),vcv(tree),lambda=1)$R a-fastAnc(tree,x) b-pic(y,tree)[names(a)]^2 r-cor(a,b,method=method) beta-setNames(lm(b~a)$coefficients[2],NULL) foo-function(tree,V){ XY-sim.corrs(tree,V) a-fastAnc(tree,XY[,1]) b-pic(XY[,2],tree)[names(a)]^2 r-cor(a,b,method=method) return(r) } r.null-c(r,replicate(nsim-1,foo(tree,V))) P-mean(abs(r.null)=abs(r)) return(list(beta=beta,r=r,P=P,method=method)) } Perhaps this is a good idea. I don't know. All the best, Liam Liam J. Revell, Assistant Professor of Biology University of Massachusetts Boston web: http://faculty.umb.edu/liam.revell/ email: liam.rev...@umb.edu blog: http://blog.phytools.org On 3/11/2013 4:03 PM, Matt Pennell wrote: John, This is a tricky question. If your independent variables were discrete, you could use a stochastic character mapping approach to map state regimes onto your tree and ask whether the regimes had different rates using a model selection approach. (This could be done with the R packages phytools or ouwie, depending on what models of trait evolution you are interested in investigating). However, since your independent variables are continuous, there is no equivalent of the stochastic mapping approach to answer this question. As far as I am aware, no model-based framework exists to address your question (sorry that to be a downer). One could conceivably derive such a model following Rich Fitzjohn's approach in QuaSSE (Sys Bio 2010) but instead of the rate of speciation/extinction depending on the state of the continuous variable, let the rate of a second variable be a function of the state of the first. But this would certainly be a lot of effort to accomplish. I agree with you as I do not think getting rates from standardized independent contrasts (sensu Garland 1992) will really allow you to get at your question. the TL;DR version is that no such method exists (at least to my knowledge) but this would definitely be a useful innovation. hope this was at least somewhat helpful. cheers, matt On Mon, Mar 11, 2013 at 12:50 PM, john d dobzhan...@gmail.com wrote: Dear colleagues, I got a philosophical/methodological/practical question. I have a continuous dependent variable (e.g. range size) and a few independent variables (e.g. body mass, encephalization ratio), and I want to test how the rate of evolution of the dependent variable is affected by the independent variables. The PCMs that I'm familiar with cannot be used to answer this question, because they usually try to predict the dependent variable based on the independent variables (e.g. PGLM) instead of looking at the rates of evolution. The whole thing gets tricky if one decides to deal with the rates of evolution of the indepentent variables as well (or not). I guess one possibility would be to use standardized independent contrasts (as in Garland 1992) for the estimation of rates. But I'm not sure how to try to predict the *rate* of evolution of range size from the values of the independent variables (and not their own rates, which is what I guess I'd get if I transformed all variables into standardized contrasts). Any thoughts? John ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/ [[alternative HTML version deleted]] ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/ ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/
Re: [R-sig-phylo] testing for correlates of rates of evolution
I did a little further exploration of this proposed method - the results discussion are here: http://blog.phytools.org/2013/03/investigating-whether-rate-of-one.html Maybe this will be of some help in deciding the best approach to go forward with. All the best, Liam Liam J. Revell, Assistant Professor of Biology University of Massachusetts Boston web: http://faculty.umb.edu/liam.revell/ email: liam.rev...@umb.edu blog: http://blog.phytools.org On 3/11/2013 6:03 PM, Liam J. Revell wrote: Hi John Matt. What about the admittedly ad hoc approach of computing the correlation between the states at ancestral nodes for x the squared contrasts for corresponding nodes for y? Then you can generate a null distribution for the test statistic (say, a Pearson or Spearman rank correlation) by simulation. This seems to give reasonable type I error when the null is correct, and when I simulate under the alternative (i.e., the rate of Brownian evolution along a branch depends on the state at the originating node) it sometimes is significant. Here's a function that does what I've described (I think - please check it carefully!). It needs phytools and all dependencies. ratebystate-function(tree,x,y,nsim=100,method=c(pearson,spearman)){ method-method[1] if(!is.binary.tree(tree)) tree-multi2di(tree) V-phyl.vcv(cbind(x,y),vcv(tree),lambda=1)$R a-fastAnc(tree,x) b-pic(y,tree)[names(a)]^2 r-cor(a,b,method=method) beta-setNames(lm(b~a)$coefficients[2],NULL) foo-function(tree,V){ XY-sim.corrs(tree,V) a-fastAnc(tree,XY[,1]) b-pic(XY[,2],tree)[names(a)]^2 r-cor(a,b,method=method) return(r) } r.null-c(r,replicate(nsim-1,foo(tree,V))) P-mean(abs(r.null)=abs(r)) return(list(beta=beta,r=r,P=P,method=method)) } Perhaps this is a good idea. I don't know. All the best, Liam Liam J. Revell, Assistant Professor of Biology University of Massachusetts Boston web: http://faculty.umb.edu/liam.revell/ email: liam.rev...@umb.edu blog: http://blog.phytools.org On 3/11/2013 4:03 PM, Matt Pennell wrote: John, This is a tricky question. If your independent variables were discrete, you could use a stochastic character mapping approach to map state regimes onto your tree and ask whether the regimes had different rates using a model selection approach. (This could be done with the R packages phytools or ouwie, depending on what models of trait evolution you are interested in investigating). However, since your independent variables are continuous, there is no equivalent of the stochastic mapping approach to answer this question. As far as I am aware, no model-based framework exists to address your question (sorry that to be a downer). One could conceivably derive such a model following Rich Fitzjohn's approach in QuaSSE (Sys Bio 2010) but instead of the rate of speciation/extinction depending on the state of the continuous variable, let the rate of a second variable be a function of the state of the first. But this would certainly be a lot of effort to accomplish. I agree with you as I do not think getting rates from standardized independent contrasts (sensu Garland 1992) will really allow you to get at your question. the TL;DR version is that no such method exists (at least to my knowledge) but this would definitely be a useful innovation. hope this was at least somewhat helpful. cheers, matt On Mon, Mar 11, 2013 at 12:50 PM, john d dobzhan...@gmail.com wrote: Dear colleagues, I got a philosophical/methodological/practical question. I have a continuous dependent variable (e.g. range size) and a few independent variables (e.g. body mass, encephalization ratio), and I want to test how the rate of evolution of the dependent variable is affected by the independent variables. The PCMs that I'm familiar with cannot be used to answer this question, because they usually try to predict the dependent variable based on the independent variables (e.g. PGLM) instead of looking at the rates of evolution. The whole thing gets tricky if one decides to deal with the rates of evolution of the indepentent variables as well (or not). I guess one possibility would be to use standardized independent contrasts (as in Garland 1992) for the estimation of rates. But I'm not sure how to try to predict the *rate* of evolution of range size from the values of the independent variables (and not their own rates, which is what I guess I'd get if I transformed all variables into standardized contrasts). Any thoughts? John ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/ [[alternative HTML version deleted]] ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http
Re: [R-sig-phylo] testing for correlates of rates of evolution
Hi Rob - Regarding your comment: However, I think that this test is not quite the same as asking whether the rate of evolution of y depends on x. For example, it's possible you could (correctly) see a relationship with Liam's test even if there was no relationship between the rate of evolution of y and x - as long as larger ancestral x's produce higher variance in y, then Liam's test will reveal a relationship. Perhaps I'm missing something, but this is exactly what we are looking for - that is, an effect of x on the instantaneous variance of the Brownian process of evolution in y (i.e., the rate of evolution in y, sensu O'Meara et al. 2006 and other refs). All the best, Liam Liam J. Revell, Assistant Professor of Biology University of Massachusetts Boston web: http://faculty.umb.edu/liam.revell/ email: liam.rev...@umb.edu blog: http://blog.phytools.org On 3/15/2013 7:15 PM, Rob Lanfear wrote: Hi All, Just a follow up on this. I was thinking about what Liam suggested, and I think it's a different test to what I suggested, but maybe I'm wrong. In particular, Liam's using squared contrasts in y, so that's asking whether the absolute size of changes in y depends on x at the ancestral node. I might have missed something here, but that sounds very similar in principle to Freckleton's test of whether the variance of trait differences is unrelated to their absolute values [1], except that the latter looks for correlations between the absolute value of differences in x versus x at the ancestral node. It might be useful to consult that paper [1] to get some more ideas for how to interpret those kinds of results. However, I think that this test is not quite the same as asking whether the rate of evolution of y depends on x. For example, it's possible you could (correctly) see a relationship with Liam's test even if there was no relationship between the rate of evolution of y and x - as long as larger ancestral x's produce higher variance in y, then Liam's test will reveal a relationship. But the direction of the effect of x on the rate of y could still be completely randomly assigned among datapoints (that information disappears when squaring the y's). Not sure if I'm just confused here. Any thoughts? Rob On 16 March 2013 09:30, john d dobzhan...@gmail.com mailto:dobzhan...@gmail.com wrote: Thank you all for your ideas. I'll probably explore further Liam's method. sincerely, john On Tue, Mar 12, 2013 at 5:33 PM, Liam J. Revell liam.rev...@umb.edu mailto:liam.rev...@umb.edu wrote: I did a little further exploration of this proposed method - the results discussion are here: http://blog.phytools.org/2013/03/investigating-whether-rate-of-one.html Maybe this will be of some help in deciding the best approach to go forward with. All the best, Liam Liam J. Revell, Assistant Professor of Biology University of Massachusetts Boston web: http://faculty.umb.edu/liam.revell/ email: liam.rev...@umb.edu mailto:liam.rev...@umb.edu blog: http://blog.phytools.org On 3/11/2013 6:03 PM, Liam J. Revell wrote: Hi John Matt. What about the admittedly ad hoc approach of computing the correlation between the states at ancestral nodes for x the squared contrasts for corresponding nodes for y? Then you can generate a null distribution for the test statistic (say, a Pearson or Spearman rank correlation) by simulation. This seems to give reasonable type I error when the null is correct, and when I simulate under the alternative (i.e., the rate of Brownian evolution along a branch depends on the state at the originating node) it sometimes is significant. Here's a function that does what I've described (I think - please check it carefully!). It needs phytools and all dependencies. ratebystate-function(tree,x,y,nsim=100,method=c(pearson,spearman)){ method-method[1] if(!is.binary.tree(tree)) tree-multi2di(tree) V-phyl.vcv(cbind(x,y),vcv(tree),lambda=1)$R a-fastAnc(tree,x) b-pic(y,tree)[names(a)]^2 r-cor(a,b,method=method) beta-setNames(lm(b~a)$coefficients[2],NULL) foo-function(tree,V){ XY-sim.corrs(tree,V) a-fastAnc(tree,XY[,1]) b-pic(XY[,2],tree)[names(a)]^2 r-cor(a,b,method=method) return(r) } r.null-c(r,replicate(nsim-1,foo(tree,V))) P-mean(abs(r.null)=abs(r)) return(list(beta=beta,r=r,P=P,method=method)) } Perhaps this is a good idea. I don't know. All the best, Liam Liam J. Revell, Assistant Professor of Biology University of Massachusetts Boston web: http://faculty.umb.edu/liam.revell/ email: liam.rev...@umb.edu
Re: [R-sig-phylo] Concatenation of character matrices
Hi Thomas. For just two matrices (say, A B), this is easy: aa-sort(unique(c(rownames(A),rownames(B bb-sort(unique(c(colnames(A),colnames(B XX-matrix(NA,length(aa),length(bb),dimnames=list(aa,bb)) XX[rownames(A),colnames(B)]-A XX[rownames(B),colnames(B)]-B There may be some easier way to do this; and if your matrices were put in a list (say, when they were read into memory) we could easily apply even the method above automatically across the list automatically with no difficulty. Say, for list of matrices YY: aa-sort(unique(sapply(YY,rownames))) bb-sort(unique(sapply(YY,colnames))) XX-matrix(NA,length(aa),length(bb),dimnames=list(aa,bb)) for(i in 1:length(YY)) XX[rownames(YY[[i]]),colnames(YY[[i]])]-YY[[i]] at least. Hope this is helpful. Liam Liam J. Revell, Assistant Professor of Biology University of Massachusetts Boston web: http://faculty.umb.edu/liam.revell/ email: liam.rev...@umb.edu blog: http://blog.phytools.org On 3/20/2013 3:07 PM, thom.g wrote: Hi, I have a certain number of character matrices that I want to merge together but unfortunately all the matrices don't have the same number of taxa and characters. However there is always more than one character/taxa in common between each matrix. I have something like that: TaxaCharacter1 Character2 Character4 Taxon1 0 0 0 Taxon2 0 0 1 Taxon5 0 1 1 Taxa Character2 Character3 Character5 Taxon3 0 1 1 Taxon4 1 0 1 Taxon5 1 1 1 And I'd like something like that: Taxa Character1 Character2 Character3 Character4 Character5 Taxon1 0 0 ? 0 ? Taxon2 0 0 ? 1 ? Taxon3 ? 0 1 ? 1 Taxon4 ? 1 0 ? 1 Taxon5 0 1 1 1 1 Does someone already worked on this problem on R? The difference R base functions like merge() or aggregate() don't do the job unfortunately. Thanks, Thomas. Thomas Guillerme Trinity Postgraduate Research Studentship Macroecology and Macroevolution Research Group Zoology Department School of Natural Sciences Trinity College Dublin 2 College Green, Dublin 2 guill...@tcd.ie @TGuillerme +353877021326 http://www.tcd.ie/Zoology/research/ncooper/people.php [[alternative HTML version deleted]] ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/ ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/
Re: [R-sig-phylo] ace in ape how to derive ancestral states (which.max)
Hi Andres. To answer your question at its face value, the line of code you want is: anc-setNames(apply(ANC$lik.anc,1,function(x) names(which.max(x))),length(tree$tip.label)+1:tree$Nnode) (The setNames ensures that your resultant vector gets the node numbers for names.) This message really deserves a more lengthy response for two reasons: 1) The scaled likelihoods returned by ace(...,type=discrete) are actually conditional likelihoods of the subtree obtained via the pruning algorithm of Felsenstein. These should generally not be used for ancestral state estimation (we should use the marginal or joint reconstructions instead). 2) Extracting only the most likely state as *the* ancestral state at the node ignores the uncertainty of our estimate. Really we have a probability distribution for the states at each node. - Liam Liam J. Revell, Assistant Professor of Biology University of Massachusetts Boston web: http://faculty.umb.edu/liam.revell/ email: liam.rev...@umb.edu blog: http://blog.phytools.org On 3/20/2013 2:23 PM, Andrés Parada wrote: Sorry if this was addressed before. I reconstructed a binary discrete character with* ace* (coded as 0-1). As suggested in APER 2nd edition with the* Sylvia *case I can obtain the lik.anc values but I wanted to derive the ancestral states from this matrix. *First* *ANC - ace (data$character, tree, type=d, method=ML, model=ARD)* ANC$lik.anc look like this: 01 [1,] 9.999185e-01 8.147752e-05 *So I applied the same strategy as in the example mentioned above:* *anc - apply(ANC$lik.anc, 1, which.max)* This returned a matrix of with values of 1 or 2. Is there a way to modify the formula and sort this so it returns only 0 or 1 depending on which state has higher probability? Thank you very much for your help. [[alternative HTML version deleted]] ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/ ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/
Re: [R-sig-phylo] ace in ape how to derive ancestral states (which.max)
One final comment on this. Since the conditional scaled likelihoods of the subtree and the marginal ancestral state reconstruction are equivalent at the root node of the tree, a computationally simple (but slow) method for getting the marginal ancestral state reconstructions is to move the root to each node of the tree and recompute the conditional likelihoods using ace. (Described in Felsenstein 2004, p. 259.) This works only for a reversible model of evolution. I posted a function for this on my blog (although it is straightforward), here: http://blog.phytools.org/2013/03/a-little-more-on-ancestral-state.html. There are much more efficient algorithms for this, and I believe that this is what is implemented in diversitree by Rich Fitzjohn: http://www.zoology.ubc.ca/prog/diversitree/. And, to reiterate, you should not be using the conditional scaled likelihoods from ace as ancestral state estimates (except at the root). All the best, Liam Liam J. Revell, Assistant Professor of Biology University of Massachusetts Boston web: http://faculty.umb.edu/liam.revell/ email: liam.rev...@umb.edu blog: http://blog.phytools.org On 3/20/2013 6:02 PM, Liam J. Revell wrote: Just to follow up on ace(...,type=discrete), which I believe is not doing what many people think it is doing: http://blog.phytools.org/2013/03/conditional-scaled-likelihoods-in-ace.html. Please don't take this as a criticism of ace - but I believe that the difference between conditional, marginal, and joint reconstructions is not appreciated by many phylogeny method users. One option to get what I believe will asymptotically converge to the joint reconstructions would be to use the posterior frequencies from stochastic mapping under the model. This can be done using make.simmap as described in my blog post - but please make sure that you have the latest phytools release (phytools=0.2-27) if you want to do this, as there was a bug in earlier versions of make.simmap. Finally, something that I neglected to note in my post is that I believe that Rich Fitzjohn's package diversitree can be used to obtain the joint reconstructions under likelihood. All the best, Liam Liam J. Revell, Assistant Professor of Biology University of Massachusetts Boston web: http://faculty.umb.edu/liam.revell/ email: liam.rev...@umb.edu blog: http://blog.phytools.org On 3/20/2013 4:50 PM, Liam J. Revell wrote: Hi Andres. To answer your question at its face value, the line of code you want is: anc-setNames(apply(ANC$lik.anc,1,function(x) names(which.max(x))),length(tree$tip.label)+1:tree$Nnode) (The setNames ensures that your resultant vector gets the node numbers for names.) This message really deserves a more lengthy response for two reasons: 1) The scaled likelihoods returned by ace(...,type=discrete) are actually conditional likelihoods of the subtree obtained via the pruning algorithm of Felsenstein. These should generally not be used for ancestral state estimation (we should use the marginal or joint reconstructions instead). 2) Extracting only the most likely state as *the* ancestral state at the node ignores the uncertainty of our estimate. Really we have a probability distribution for the states at each node. - Liam Liam J. Revell, Assistant Professor of Biology University of Massachusetts Boston web: http://faculty.umb.edu/liam.revell/ email: liam.rev...@umb.edu blog: http://blog.phytools.org On 3/20/2013 2:23 PM, Andrés Parada wrote: Sorry if this was addressed before. I reconstructed a binary discrete character with* ace* (coded as 0-1). As suggested in APER 2nd edition with the* Sylvia *case I can obtain the lik.anc values but I wanted to derive the ancestral states from this matrix. *First* *ANC - ace (data$character, tree, type=d, method=ML, model=ARD)* ANC$lik.anc look like this: 01 [1,] 9.999185e-01 8.147752e-05 *So I applied the same strategy as in the example mentioned above:* *anc - apply(ANC$lik.anc, 1, which.max)* This returned a matrix of with values of 1 or 2. Is there a way to modify the formula and sort this so it returns only 0 or 1 depending on which state has higher probability? Thank you very much for your help. [[alternative HTML version deleted]] ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/ ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/ ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org
Re: [R-sig-phylo] Information on {ape] function nodelabels
Hi Nicholas. The nodes in a phylo object are numbered 1:tree$Nnode+N, where N is the number of tips in the tree tree$Nnode is the number of nodes. That means if your tree has 10 tips and 9 internal nodes, the node numbers will be 11, 12, 13, ..., 19. To use nodelabels(...,pie=XX) you can supply a matrix in which the rows sum to one (as in the transpose of your example matrix). If the node argument is left empty, then it is assumed that the rows are in node order (i.e., 11, 12, ..., 19). This is the order returned by (for example) ace. If you have a matrix in which the nodes are in a different order, you can use nodelabels(...,node=YY,pie=XX) in which YY supplies the node numbers (using the scheme above) for the rows of XX. Note that as per recent discussion on this email list, ace(...,type=discrete) should not be used for ancestral state reconstruction except for the root node of the tree. I describe this in some posts to my blog (http://blog.phytools.org/2013/03/conditional-scaled-likelihoods-in-ace.html and http://blog.phytools.org/2013/03/a-little-more-on-ancestral-state.html). The scaled likelihoods at internal nodes are conditional likelihoods for the subtree, not marginal ancestral state reconstructions. I believe that the help page for ace will be clarified in future versions of ape. Good luck. Liam Liam J. Revell, Assistant Professor of Biology University of Massachusetts Boston web: http://faculty.umb.edu/liam.revell/ email: liam.rev...@umb.edu blog: http://blog.phytools.org On 3/25/2013 12:09 PM, Nicholas Crouch wrote: Hi, Hoe does the function nodelabels match data to the phylogeny? Ancestral reconstruction produces a vector with a sample output: st [,1] [,2][,3] [,4][,5] [1,] 0.0001381085 0.007642601 0.01638118 0.001591162 0.001268793 [2,] 0.9998618915 0.992357399 0.98361882 0.998408838 0.998731207 The numbers across the top do no correlate with the node numbers on the phylogeny: compare nodelabels(pie=t(st)) with nodelabels() Would it be possible to create a list vector of the nodes in a phylogeny with the corresponding values from an ancestral reconstruction? An example could be something like: vector Node 1 [1] 0.05 [2] 0.95 Node 2 [1] 0.3 [2] 0.7 Thanks [[alternative HTML version deleted]] ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/ ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/
Re: [R-sig-phylo] combining p-values in PCMs
Hi Vladimir. I'm not clear on the source of your misunderstanding, so let me do my best. 1) I think that's right is casual shorthand in English. In this case I just mean that I think that this method will give us the correct variance on beta (but I'm not sure - hence 'I think'). It certainly seems better than computing the variance on the estimator from the single best tree or consensus tree, which ignores phylogenetic uncertainty; or from the variance in beta across trees, which ignores sampling error in the estimation of beta on any tree. It seems unlikely that this would dramatically underestimate the variance in beta. 2) The null hypothesis here is the slope of our fitted model (beta) is zero, but I don't think we need to focus specifically on null hypothesis testing. What we're really interested in is getting a true (or at least better) estimate of the uncertainty in the model coefficient that takes into account the different sources of error in our data tree. I just suggested a t-test because people are often interested in getting a p-value from their fitted model and the p-value from a single consensus or best tree will surely be too small. I hope that seems reasonable to you. All the best, Liam Liam J. Revell, Assistant Professor of Biology University of Massachusetts Boston web: http://faculty.umb.edu/liam.revell/ email: liam.rev...@umb.edu blog: http://blog.phytools.org On 4/2/2013 7:27 PM, Vladimir Minin wrote: Hi Liam, When you say I think that's right, I am not sure I understand what you mean. In particular, what is your null hypothesis stated mathematically and what frequentist guarantees do you have for the p-value you are computing? Typically, we desire the uniform distribution of p-values under the null. This would be easy to check by simulation, but this simulation would require to generate both sequence and non-sequence trait data, in other words we would need to decide what exactly the null hypothesis is here. I am suspicious of your approach, because, in general, Neyman-Pearson hypothesis testing framework has hard time accommodating uncertainty in nuisance parameters (phylogenies in this case). Unfortunately, I don't know if a rigorous answer to the original question posed by John exists, so I cannot suggest a practical alternative other than building a large Bayesian model that encompasses uncertainty about all its parameters. In that case, by the way, testing hypotheses still remains infeasible, because Bayesian framework lacks tools for testing model fit in the absence of an alternative (posterior predictive model checks are exceptions, but their interpretation/calibration is questionable anyway), meaning that Bayesians can only compare models, but cannot examine the fit of one (null) model in isolation. -Vladimir Vladimir Minin Assistant Professor Department of Statistics University of Washington, Seattle Padelford Hall C-315, Box 354322 Seattle, WA 98195-4322 telephone: 206-543-4302 web: www.stat.washington.edu/vminin http://www.stat.washington.edu/vminin On Apr 1, 2013, at 9:04 AM, Liam J. Revell wrote: Hi John list. Can I add to this by suggesting (in case it is not clear from Joe's comment) that we should probably combine the variance among estimates obtained from different trees in the posterior sample with the sampling variance of any single estimate? One fairly sensible way to do this is to compute the variance due to phylogenetic uncertainty as the variance among estimates obtained from the trees of the posterior sample; and then to compute the sampling variance as the mean variance of the estimator from each tree; and then add the two variances them. The standard error of our estimate (computed as the mean across trees) is the square-root of this variance. To conduct a hypothesis test on the regression coefficient, then, you would compute the mean across trees and then the ratio of the parameter and its standard error should have a t-distribution with n-2 degrees of freedom for n taxa (not contrasts). To do this from a practical perspective from trees in 'multiPhylo' object (here trees) and data in x y for a simple bivariate regression, we do the following: # first define the following custom function ff-function(tree,x,y){ pic.x-pic(x,tree) pic.y-pic(y,tree) fit-lm(pic.y~pic.x-1) setNames(c(coef(fit),vcov(fit)),c(beta,var(beta))) } # now apply to all trees in your sample BB-t(sapply(trees,ff,x,y)) # total variance in beta estimated by varBeta-var(BB[,beta])+mean(BB[,var(beta)]) t.beta-mean(BB[,beta])/sqrt(varBeta) P.beta-2*pt(abs(t.beta),df=length(trees[[1]]$tip)-2,lower.tail=FALSE) I think that's right. All the best, Liam Liam J. Revell, Assistant Professor of Biology University of Massachusetts Boston web: http://faculty.umb.edu/liam.revell/ email: liam.rev...@umb.edu mailto:liam.rev...@umb.edu blog: http://blog.phytools.org On 4/1/2013 11:10 AM, Joe Felsenstein wrote: John D asked: Given that things have been quiet
Re: [R-sig-phylo] pruning a tree
Hi Elaine. A simpler way to do method 1 that doesn't require na.omit or match is: # if there are species in the tree that are missing from the data: ss-species.to.keep[,1] # species in the data tree-birdtree tree.pruned-drop.tip(tree,setdiff(tree$tip.label,ss)) I would guess that method 2 isn't working because your species names are not row names in your data matrix. To have that, you could do: birddata-read.csv(H:/birddata_family_20130405.csv,header=T,row.names=1) All the best, Liam Liam J. Revell, Assistant Professor of Biology University of Massachusetts Boston web: http://faculty.umb.edu/liam.revell/ email: liam.rev...@umb.edu blog: http://blog.phytools.org On 4/5/2013 8:54 AM, Elaine Kuo wrote: Hello This Elaine. I tried to prune a phylogeny tree using two methods based on the code attached. Method 1 returned the tip.label sharing between birddata and birdtree. Method 2 returned nothing. Please kindly indicate why Method 2 failed to prune the tree. Also, please kindly explain the command “birdtree$tip.label[-na.omit” Thank you again. Code Method 1 Library(ape) birddata - read.csv(H:/ birddata_family_20130405.csv, header = TRUE) birdtree - read.nexus(H:/ birddata_family_20130405.nexus) setwd(H:) ## prune the phylogeny to include only species which are in the data set species.to.keep-read.csv(H:/ birddata_family_20130405.csv, header = TRUE) birdtree.pruned-drop.tip(birdtree,birdtree$tip.label[-na.omit(match(species.to.keep[,1],birdtree$tip.label))]) Method 2 Library(ape) birddata - read.csv(H:/ birddata_family_20130405.csv, header = TRUE) birdtree - read.nexus(H:/ birddata_family_20130405.nexus) setwd(H:) ## prune the phylogeny to include only species which are in the data set temp - name.check(birdtree, birddata) birdtree.pruned - drop.tip(birdtree, tip=temp$Tree.not.data) [[alternative HTML version deleted]] ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/ ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/
Re: [R-sig-phylo] Exporting Ancestor Reconstruction from Mesquite to R for ouch
Hi Martin. Without more information or your input file, it's hard for me to evaluate why read.simmap is hanging. Did you run: tree-read.simmap(filename,format=nexus,version=1.5) If you have your discrete character data in R, you can also use the phytools function make.simmap to generate stochastic maps. Make sure that you are using a recent version of phytools (=0.2-26; =0.2-33 for uncertain/unknown tip states; =0.2-37 for non-symmetric transition matrices) as earlier versions are buggy. My advice would be to use make.simmap to generate stochastic maps (or read them from file), and then use OUwie (in the package OUwie) to fit a multiple optima OU model to each mapped tree. For discrete character x and continuous trait y (each vectors in the same order with names(x)=names(y)=species names); and a phylogeny, tree, do: library(phytools) library(OUwie) mtrees-make.simmap(tree,x,nsim=100) XX-data.frame(names(x),x,y) result-lapply(mtrees,OUwie,data=XX,model=OUM,simmap.tree=TRUE) This just gives you a giant list of the results returned by OUwie for each tree. It would probably be a good idea to organize your results of interest more sensibly than this so that they can be easily aggregated across mappings. Good luck. All the best, Liam Liam J. Revell, Assistant Professor of Biology University of Massachusetts Boston web: http://faculty.umb.edu/liam.revell/ email: liam.rev...@umb.edu blog: http://blog.phytools.org On 4/5/2013 6:13 PM, Martin Turcotte wrote: Dear forum, I am trying to conduct an ouch analysis and for it I must determine the ancestral selective regime at each node in my tree. To do so I need to conduct ancestral reconstruction on discrete unordered data. To my knowledge no package can do this in R (I hope I am wrong) and thus I used Mesquite's function to 'Trace Character History'. The problem is exporting this information to R. In Mesquite the only export option I see is 'Export Ancestral State Trace' but the only option is SIMMAP 1.5 format. I tried opening this file using phytools'read.simmap but it just hangs. I can open the file with FigTree and I can see the Nodel labels but when I export the tree as a nexus file it does not save the model labels. Any advice would be appreciated. thanks, Mart Martin Turcotte, Ph. D. Dept. of Biology University of Toronto at Mississauga 3359 Mississauga Road North, Mississauga, Ontario, Canada, L5L 1C6 http://individual.utoronto.ca/martinturcotte [[alternative HTML version deleted]] ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/ ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/
Re: [R-sig-phylo] statistically test whether two characters evolve dependently
Hi Peter. I'm not sure if this is what you mean, but Felsenstein (2005, 2012) has a method for estimating the correlation between discrete characters or a discrete and a continuous character in which the discrete trait is modeled using the threshold model from quantitative genetics. There is a stand-alone program for this called Threshml (http://evolution.gs.washington.edu/phylip/download/threshml/). I have a Bayesian MCMC version in the phytools package called threshBayes (http://blog.phytools.org/search?q=threshBayes). So far, threshBayes only allows binary discrete characters two traits; Threshml can analyze more than two traits - but I believe is still limited to binary discrete characters. Please let me know if it would be helpful to your research to extend threshBayes to allow for multistate (ordered, naturally) discrete characters. All the best, Liam Liam J. Revell, Assistant Professor of Biology University of Massachusetts Boston web: http://faculty.umb.edu/liam.revell/ email: liam.rev...@umb.edu blog: http://blog.phytools.org On 4/7/2013 6:35 PM, New Bio wrote: Dear all, Just started studying phylogeny. I know BayesTraits program can calculate the likelihood of the observed data given the model of two charaters evolve dependently or independently. I am wondering how to do it with R packages or functions for discrete and continuous characters? Best, Peter [[alternative HTML version deleted]] ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/ ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/
Re: [R-sig-phylo] statistically test whether two characters evolve dependently
Peter - I realized that this is not very clear. Threshml can analyze multiple continuous discrete characters, but the discrete characters must be binary. threshBayes has the same limitation, but is also limited to only two traits (binary-binary, binary-continuous, or continuous-continuous). - Liam Liam J. Revell, Assistant Professor of Biology University of Massachusetts Boston web: http://faculty.umb.edu/liam.revell/ email: liam.rev...@umb.edu blog: http://blog.phytools.org On 4/8/2013 8:27 PM, Liam J. Revell wrote: Hi Peter. I'm not sure if this is what you mean, but Felsenstein (2005, 2012) has a method for estimating the correlation between discrete characters or a discrete and a continuous character in which the discrete trait is modeled using the threshold model from quantitative genetics. There is a stand-alone program for this called Threshml (http://evolution.gs.washington.edu/phylip/download/threshml/). I have a Bayesian MCMC version in the phytools package called threshBayes (http://blog.phytools.org/search?q=threshBayes). So far, threshBayes only allows binary discrete characters two traits; Threshml can analyze more than two traits - but I believe is still limited to binary discrete characters. Please let me know if it would be helpful to your research to extend threshBayes to allow for multistate (ordered, naturally) discrete characters. All the best, Liam Liam J. Revell, Assistant Professor of Biology University of Massachusetts Boston web: http://faculty.umb.edu/liam.revell/ email: liam.rev...@umb.edu blog: http://blog.phytools.org On 4/7/2013 6:35 PM, New Bio wrote: Dear all, Just started studying phylogeny. I know BayesTraits program can calculate the likelihood of the observed data given the model of two charaters evolve dependently or independently. I am wondering how to do it with R packages or functions for discrete and continuous characters? Best, Peter [[alternative HTML version deleted]] ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/ ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/ ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/
Re: [R-sig-phylo] statistically test whether two characters evolve dependently
Thanks for clarifying this, Joe. My function for ancestral character estimation under the threshold model (ancThresh) allows multistate data - but the order of the states along the liability axis needs to be specified a priori by the user.** The relative positions of the thresholds are sampled (along with the liabilities of tips nodes) from their joint posterior probability distribution using MCMC. This method is still undergoing peer review, but it seems to work fairly well in simulations - so this also might be a viable approach for estimating the correlation between multistate threshold characters. (**In my manuscript about ancThresh I suggest that an information criterion like DIC could be used to identify the correct order - but a single true order must exist.) As for unordered multistate characters, I'm not sure how the threshold model could be used. Several people have made suggestions about this to me but I'm not sure I understand them. All the best, Liam Liam J. Revell, Assistant Professor of Biology University of Massachusetts Boston web: http://faculty.umb.edu/liam.revell/ email: liam.rev...@umb.edu blog: http://blog.phytools.org On 4/9/2013 7:22 PM, Joe Felsenstein wrote: New Bio wrote: Thanks, Liam. good to know. I think extending the tools to analyze traits of ordered/unordered multistates can be very useful. There are many interesting traits such as oxygent requirement (anaerobic, facultative, aerobic) of microbes, which is ordered multistates, and habitats (water, air, soil), which is unordered. For approaches like the threshold model, it is not simple to see how to handle multistate characters. Should we assume one scale? If so, how far from the 0/1 threshold should we place the 1/2 threshold? That becomes an additional parameter to be estimated. Or should we have an additional axis, so that 0/1 is on the x axis, while [01]/2 is on another axis? And then what do we do about state 3 if it also exists? That way lies madness ... or perhaps a good Ph.D. thesis topic. (This is in some way related to Transformation Series Analysis, which was an issue with parsimony methods. One imagined one's states arranged in a character transformation series which could even be a tree, the Character State Tree. Then one wanted to infer the phylogeny and at the same time also the CST, using parsimony as the criterion. In a sense what I am raising is the likelihood and modeling equivalent problem.) Joe Joe Felsenstein j...@gs.washington.edu mailto:j...@gs.washington.edu Department of Genome Sciences and Department of Biology, University of Washington, Box 355065, Seattle, WA 98195-5065 USA ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/
Re: [R-sig-phylo] pruning tree of 10000 species
Hi Elaine. A tree of 10,000 tips written to file should not take up 3.9 GB. Even a 1000 Newick trees of this size should not take up this much space. Are you sure this is the size of your file? All the best, Liam Liam J. Revell, Assistant Professor of Biology University of Massachusetts Boston web: http://faculty.umb.edu/liam.revell/ email: liam.rev...@umb.edu blog: http://blog.phytools.org On 4/10/2013 7:04 PM, Elaine Kuo wrote: Hello This is Elaine. I downloaded a bid phylogeny tree of 1 species. The file of the whole tree (1 species) is about 3.9 GB. I want like to extract to 3000 species of them for my research. Here comes two questions: 1. Please kindly advise if R is able to input a phylogeny tree of 1 species. If possible, please kindly recommend the package. 2. Is there anything should be noted to calculate a data of 3.9 GB, such as the requirement of RAM memory. Thank you. Elaine [[alternative HTML version deleted]] ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/ ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/
Re: [R-sig-phylo] pruning tree of 10000 species
Hi Gavin et al. read.tree can accept a character string as input, so in theory you should be able to skip writing to file then reading in - which could be quite slow for very large trees. temptree - scan(MY_FILEPATH, what = , sep = \n, quiet = TRUE, skip = 0, comment.char = #, nlines=1) tree-read.tree(text=temptree) - Liam Liam J. Revell, Assistant Professor of Biology University of Massachusetts Boston web: http://faculty.umb.edu/liam.revell/ email: liam.rev...@umb.edu blog: http://blog.phytools.org On 4/12/2013 11:38 AM, Gavin Thomas wrote: Hi Elaine The online tool is limited to 2500 species because we have found the server essentially grinds to a halt if we go beyond that. Jan is correct about the original data set. Each file contains 1000 trees. I would not advise trying to read these into R in one go. It is possible to read one or trees from the file into R and this will be much more manageable. It's a little convoluted: # Scan the first line from the tree file. This will be a single newick string. If you want to use the second tree then change to skip=1 (this tells R how many lines in the inout file to skip. If you want to randomly select a tree then use skip=sample(1:1000, 1) temptree - scan(MY_FILEPATH, what = , sep = \n, quiet = TRUE, skip = 0, comment.char = #, nlines=1) # Now write out the text string to file. You now have a text file with a single tree cat(temptree, file=tree1.tre) # Read that tree back in as a phylo object tree - read.tree(tree1.tre) # So now you have your complete tree in R and you want to subset it to your 3000 tips. This is straightforward using droptip. You just need to read your list of 3000 species into R and use that in place of species_subset tree_small - drop.tip(tree, tip=setdiff(tree$tip.label, species_subset) You could loop across a whole set of trees to get a set of 1000 pruned trees (the file will still be big but would contain trees of just the taxa of interest): for (i in 1:1000) { temptree - scan(MY_FILEPATH, what = , sep = \n, quiet = TRUE, skip = i-1, comment.char = #, nlines=1) cat(temptree, file=tree1.tre) tree - read.tree(tree1.tre) tree_small - drop.tip(tree, tip=setdiff(tree$tip.label, species_subset) write.tree(tree_small, file=MyNewTrees.tre, append=TRUE) } None of this is elegant - any better suggestions would be very welcome! I hope that helps. Best Gavin ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/ ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/