[R-sig-phylo] Evolutionary Quantitative Genetics Workshop 2023
A reminder: applications due March 31. The Evolutionary Quantitative Genetics Workshop for 2023 will take place at Friday Harbor Laboratories of the University of Washington from June 4-9. The workshop will review the basics of theory in the field of evolutionary quantitative genetics and its connections to evolution observed at various time scales. One aim of the workshop is to build a bridge between the traditionally separate disciplines of quantitative genetics and phylogenetic comparative methods. The workshop involves lectures, discussions and in-class computer exercises. This workshop has been given yearly since 2011, and at Friday Harbor Laboratories since 2017. It was canceled in 2020, and given as an online workshop in 2021 and 2022. It is intended for graduate students, postdocs, and junior faculty. Information on the subject area, lecturers, costs and application form will be found at: https://fhl.uw.edu/courses/course-descriptions/course/evolutionary-quantitative-genetics-workshop-2023/ and more information, including details of the Workshop in recent years, will be found at https://eqgw.github.io Joe Felsenstein and Steve Arnold -- Joe Felsensteinfelse...@gmail.com, felse...@uw.edu Department of Genome Sciences and Department of Biology, University of Washington, Box 355065, Seattle, WA 98195-5065 USA - PS: please do not use j...@gs.washington.edu, which is an alias that some mail systems now mistake as indicating spam. ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/
Re: [R-sig-phylo] Evolutionary Quantitative Genetics Workshop 2023
K-Hz7m0Vt54!lAc-_tyBKyUS0BwdXHfS_bddA8mSgcsrMp6Nly1GJM-UkNMs09pABZP0gIb89WvlGmsVldrV5B-5tMdJeYl2P7XvDP20eBi7Dw$ > > Introduction to Bayesian modelling with INLA (BMIN02) > May 22nd - 26th > https://urldefense.com/v3/__https://www.prstatistics.com/course/introduction-to-bayesian-modelling-with-inla-bmin02/__;!!K-Hz7m0Vt54!lAc-_tyBKyUS0BwdXHfS_bddA8mSgcsrMp6Nly1GJM-UkNMs09pABZP0gIb89WvlGmsVldrV5B-5tMdJeYl2P7XvDP09ie8kTQ$ > > Reproducible and collaborative data analysis with R (RACR02) > June 13th - 15th > https://urldefense.com/v3/__https://www.prstatistics.com/course/reproducible-and-collaborative-data-analysis-with-r-racr02/__;!!K-Hz7m0Vt54!lAc-_tyBKyUS0BwdXHfS_bddA8mSgcsrMp6Nly1GJM-UkNMs09pABZP0gIb89WvlGmsVldrV5B-5tMdJeYl2P7XvDP2JO-LW_Q$ > > Species Distribution Modelling With Bayesian Statistics Using R (SDMB05) > September 4th - 8th > https://urldefense.com/v3/__https://www.prstatistics.com/course/online-course-species-distribution-modelling-with-bayesian-statistics-using-r-sdmb05/__;!!K-Hz7m0Vt54!lAc-_tyBKyUS0BwdXHfS_bddA8mSgcsrMp6Nly1GJM-UkNMs09pABZP0gIb89WvlGmsVldrV5B-5tMdJeYl2P7XvDP0ZmkHHkg$ > > Multivariate Analysis Of Ecological Communities Using R With The VEGAN > package (VGNR05) > September 18th - 22nd > https:// > https://urldefense.com/v3/__http://www.prstatistics.com/course/multivariate-analysis-of-ecological-communities-using-r-with-the-vegan-package-vgnr05/__;!!K-Hz7m0Vt54!lAc-_tyBKyUS0BwdXHfS_bddA8mSgcsrMp6Nly1GJM-UkNMs09pABZP0gIb89WvlGmsVldrV5B-5tMdJeYl2P7XvDP3z5_1e7g$ > > -- > > Oliver Hooker PhD. > PR statistics > > [[alternative HTML version deleted]] > > ___ > R-sig-phylo mailing list - R-sig-phylo@r-project.org > https://urldefense.com/v3/__https://stat.ethz.ch/mailman/listinfo/r-sig-phylo__;!!K-Hz7m0Vt54!lAc-_tyBKyUS0BwdXHfS_bddA8mSgcsrMp6Nly1GJM-UkNMs09pABZP0gIb89WvlGmsVldrV5B-5tMdJeYl2P7XvDP1_L29Apw$ > Searchable archive at > https://urldefense.com/v3/__http://www.mail-archive.com/r-sig-phylo@r-project.org/__;!!K-Hz7m0Vt54!lAc-_tyBKyUS0BwdXHfS_bddA8mSgcsrMp6Nly1GJM-UkNMs09pABZP0gIb89WvlGmsVldrV5B-5tMdJeYl2P7XvDP1XIWsucQ$ -- Joe Felsensteinfelse...@gmail.com, felse...@uw.edu Department of Genome Sciences and Department of Biology, University of Washington, Box 355065, Seattle, WA 98195-5065 USA - PS: please do not use j...@gs.washington.edu, which is an alias that some mail systems now mistake as indicating spam. ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/
Re: [R-sig-phylo] phylogenetic correlation analysis
Whether one gets them from PGLS directly or from contrasts, one can get correlations by just inferring the covariance matrix and then calculating r(x,y) = Cov(x,y) / (Var(x) Var(y))^(1/2) where of course Var(x) is also Cov(x,x), and so on. You would not need a separate run to get correlations. Joe -- Joe Felsenstein felse...@gmail.com, j...@gs.washington.edu Department of Genome Sciences and Department of Biology, University of Washington, Box 355065, Seattle, WA 98195-5065 USA ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/
Re: [R-sig-phylo] If my trait X cannot be regressed by body size, how can I rescue residuals corrected by the phylogeny and SE?
Gabriel Ferreira explained: > I will try to better explain my problem, and I really appreciate your time > to help me with this issue. > My study is a conventional ecomorph with linear and univariate > measurements > > So... Some of my traits are linear measures that can and must be > "corrected" by body size, such as tail length. I usually conduct such body > size corrections with phylogenetic regressions using *gls *func. from > *nlme*: > lots of details ... > So I could use the residuals of this regression in the phy ANOVAs ... Does > it make sense? > Well, I am afraid I am lost. Perhaps someone else here could explain the issues to me ... Joe -- Joe Felsenstein felse...@gmail.com, j...@gs.washington.edu Department of Genome Sciences and Department of Biology, University of Washington, Box 355065, Seattle, WA 98195-5065 USA [[alternative HTML version deleted]] ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/
Re: [R-sig-phylo] If my trait X cannot be regressed by body size, how can I rescue residuals corrected by the phylogeny and SE?
If wants residuals of values of a trait in each species, taking into account within-species variation and phylogeny, what does it mean if those residuals correlate with those of some other character, or with an environmental variable? Just asking which R machinery to use might wait until it is clear what the intended task is and why that makes sense. Joe -- Joe Felsenstein felse...@gmail.com, j...@gs.washington.edu Department of Genome Sciences and Department of Biology, University of Washington, Box 355065, Seattle, WA 98195-5065 USA [[alternative HTML version deleted]] ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/
Re: [R-sig-phylo] units of sigsq
Marguerite asked: > First - Joe - what do you mean by log(grams) has no units? The units of grams > is a unit, so log(mass) will have units of log-gm. As log is not the same as > 1/gm, log(gm) cannot be unit-free. I looked it up on Wikipedia, and was assured by it that Marguerite is right, log(weight) has the same units as weight, except you're supposed to call them log-gm. I do think that unless there is a calibration with time, the tree branch lengths are not in units of time but in units of base substitutions per site. And I remain confused on what the units are, if you compute a linear combination such as 2 log(wt) - 3 log(height). Which princip[al, le] components machinery does. Joe -- Joe Felsenstein felse...@gmail.com, j...@gs.washington.edu Department of Genome Sciences and Department of Biology, University of Washington, Box 355065, Seattle, WA 98195-5065 USA ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/
Re: [R-sig-phylo] units of sigsq
Ted Garland asked: OK, Joe, that's for one trait at a time. > Would you please continue your discourse, but extend to multiple traits > and their covariances > OK, assuming that’s not a joke which it seems it was. If all characters are log of something, their variances all have units of sites squared per square substitution. But if you have different units in different characters each variance would have units. (CharXunit) squared times sites-squared per square substitution. A covariance would be (CharXunit times CharYunit) times sites squared per square substitution. If you had a principal component (usually misnamed a “principle” component) it is in terms of a linear combination of characters, and I am deeply puzzled how to give their units as they mix them. Joe -- Joe Felsenstein felse...@gmail.com, j...@gs.washington.edu Department of Genome Sciences and Department of Biology, University of Washington, Box 355065, Seattle, WA 98195-5065 USA [[alternative HTML version deleted]] ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/
Re: [R-sig-phylo] units of sigsq
Folks -- If the trait is log(grams) then the trait is unit-free. The "time" is probably branch length from a phylogeny. That in turn (from DNA data) is usually DNA substitutions per site. So the units of the standard deviation aresites per substitution. But this is not the standard deviation, it is its square. So (wait for it ...)square sites per square substitution Now that is pretty weird. But years ago people pointed out to me that quantitative geneticists were accustomed to inferring variance components of crop yield. The yield might be in bushels per acre. So the units of its variance was: square bushels per square acre. Don't even try to think about how you square a bushel, or how many dimensions you have to go into to square an acre. Actually, you can think about them: a bushel is three-dimensional volume, and an acre is two dimensional area. So crop yield has units of meters, and variance of crop yield should have units of square meters. That way lies madness ... Joe - Joe Felsenstein felse...@gmail.com, j...@gs.washington.edu Department of Genome Sciences and Department of Biology, University of Washington, Box 355065, Seattle, WA 98195-5065 USA ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/
Re: [R-sig-phylo] collapsing sets of nodes based on label values
Karla Shikev -- > I just want to take a tree with node labels (e.g. > bootstrap values from a RAxML analysis) and collapse all nodes with support > values below, say, 90%. People here have given you various ways of doing this in R. I am not very familiar with the R packages, but just wanted to add one more way. If you have the original set of trees that were used to get the bootstrap values, you can do a version of the bootstrap that makes a tree which shows only the groups that are present in 90% or more of the input trees. This is called an M(0.90) consensus tree (the 0.90 is in a subscript which I can't typeset in an email). It is available in the program Consense in my PHYLIP package, and also in some other phylogeny packages. If you prefer R you can use Liam Revell's Rphylip package, together with installing a copy of PHYLIP, as Rphylip serves as a front-end for PHYLIP within R. J.F. - Joe Felsenstein j...@gs.washington.edu Department of Genome Sciences and Department of Biology, University of Washington, Box 355065, Seattle, WA 98195-5065 USA ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/
Re: [R-sig-phylo] Multivariate ASR with Discrete Characters
William Gearty -- > > Thank you Don and Julien for your suggestions. I was able to run threshml > using Rthreshml from the Rphylip package. However, now I'm not really sure > what to do with the results. Can I use the output to perform ancestral > state reconstruction/estimation for the continuous (3) and discrete (1) > characters? I see that phytools has ancThresh, but that only seems to work > with a single character. I don't know what are the limitations of Rphylip's interface with Threshml, but I can address Threshml itself -- as I wrote it. In principle the liabilities sampled at the interior nodes of the tree could be used to do ancestral reconstructions at those nodes, in a very straight- forward way. Just consider them, not the rest of the tree, and make a histogram of them at the node. However I never bothered to put in any interface to do that. However note that what Threshml does is to transform the liabilities to independent variables, using the current best estimate of the covariance matrix of liabilities. The Gibbs sampling (and at the tips, the Metropolis/Hastings sampling) occurs in the independent characters. Then you have to transform back to the liabilities before you have samples for those at the interior nodes. So matters are not simple, but in principle it can be done. Note that for continuous traits Threshml can also be used to sample values at the interior nodes. Of course this is a lot more computation than just using likelihood computations -- I've checked and the sampling does infer the same covariances in that case. In this all-continuous case the same issue of transforming to independent characters also comes up. Joe Joe Felsenstein j...@gs.washington.edu Department of Genome Sciences and Department of Biology, University of Washington, Box 355065, Seattle, WA 98195-5065 USA ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/
Re: [R-sig-phylo] Phylogenetic varying slopes and intercepts model
Oscar Inostraza -- Is the variable x also evolving on the tree? If so you need to use standard phylogenetically-informed comparative methods to estimate the variances and covariances of changes in both characters. You may not be able to assume that y responds instantly to x. J.F. Joe Felsenstein, j...@gs.washington.edu Department of Genome Sciences, University of Washington, Box 355065, Seattle, WA. 98195-5065 USA [[alternative HTML version deleted]] ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/
Re: [R-sig-phylo] Nantucket Phylogenetics DevelopeR Workshop
> > We are very happy to announce the 2nd edition of a graduate-level > workshop on phylogenetic method development R. The course will be four > days in length and take place at the University of Massachusetts > Boston's Nantucket Field Station from the 5th to the 8th of November, > 2019. Between this and the Workshop on Molecular Evolution (Woods Hole), the Applied Phylogenetics Workshop (Bodega Bay, California), and the Evolutionary Quantitative Genetics Workshop (Friday Harbor Laboratories), there seems to be a major competition to see who can have their course in the most picturesque marine laboratory. Joe ---- Joe Felsenstein j...@gs.washington.edu Department of Genome Sciences and Department of Biology, University of Washington, Box 355065, Seattle, WA 98195-5065 USA ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/
Re: [R-sig-phylo] Link between Mahalanobis distance and PGLS
Guillaume Louvel asked: > > So my first question is: can we directly apply the Mahalanobis distance > to measure a kind of "phylogeny-corrected" distance between 2 vectors of > trait values for a list of species? Since we assume a brownian motion, > we know these vectors should be drawn from a multivariate normal > distribution with known covariance matrix. Therefore the Mahalanobis > distance seems perfectly appropriate to me, is it the case? It is appropriate. In fact this is in effect what regressing contrasts in trait Y on contrasts in trait X is doing. One can alternatively use a multivariate regression appoach, which is what Alan Grafen (1989) did, and the results are the same either way (in the simplest cases). Note that although the contrasts can be treated as independent observations, that is not true for the tip species values -- the Grafen "Phylogenetic Regression" does not treat the tip values as independent, and for the same reason pairwise distances between tips are not independent. > > I don't want to do a statistical test per se, I am rather interested in > ranking many traits according to their distance to a pattern of reference. I am unclear about what that means. > > My second subsidiary question is: can I apply this Mahalanobis distance > if my traits are binary (e.g. presence-absence of some sequence in the > genomes). In that case I know that my trait is not multivariate normal, > but considering that I have millions of traits, shouldn't I expect the > whole set to have some normal characteristics? Basically no. Although people have approximated binary traits by Gaussian variables (I think Paul Harvey and Mark Pagel did in their 1991 book), it is much more appropriate to use a threshold model. See my 2012 paper in American Naturalist or the earlier 2005 sketch of the method in Proc. Royal Society of London series B. A good paper agonizing about all this is: Maddison WP & FitzJohn RG. 2015. The unsolved challenge to phylogenetic correlation tests for categorical characters. Systematic Biology 64: 127–136 though I'd say that the problem is not as "unsolved" as they think. > Finally, if none of the approach above is justified, is there a > multivariate phylogenetic method for discrete/binary traits? Some kind > of adapted phylogenetic PCA ? See above. It does require MCMC, and cannot simply be done with distances. J.F. Joe Felsenstein j...@gs.washington.edu Department of Genome Sciences and Department of Biology, University of Washington, Box 355065, Seattle, WA 98195-5065 USA ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/
Re: [R-sig-phylo] Extended Majority Consensus Topology (Allcompat Summary) in R? And some observations on ape's consensus() function
David Bapst asked: > > I was interested if anyone was familiar with R code that can estimate an > extended majority consensus tree (referred to as an 'allcompat' tree by the > sumt command in MrBayes)? This is a fully bifurcating summary of a tree > posterior, where each clade is maximally resolved by the split that is most > abundant in the considered post-burn-in posterior (i.e., that split which > has the plurality, if not the majority - the highest posterior probability > of any other competing, conflicting splits recovered within the posterior. > So, I guess one could also call these plurality consensus trees...). You can also get the Extended Majority Rule consensus tree from the Rconsense function in Liam Revell's package Rphylip, which is calls programs from my PHYLIP package. Consense in PHYLIP does output, in addition to the consensus tree itself, a list of partitions found in the input trees, and the frequency of each. Rconsense may be able to do that too. Speaking as the one who defined the EMR method, I need to make a warning: EMR makes a tree by ordering the partitions (splits) in order of their frequency. To make Margush and McMorris's Majority Rule consensus tree one simply goes down this list and takes all the partitions that occur more than 50% of the time. EMR continues further, in order to resolve the tree fully. It keep accepting partitions as long as they don't conflict with anything already accepted. But this means that two partitions could be found (say) 45% of the time each. They could both be compatible with the partitions in the majority-rule tree, but in conflict with each other. Which then gets included then depends only on which one is encountered first as one goes down the list of most-frequent partitions. And that will just be a matter of things like the order in which the tree containing them occurs among the input trees. That is one of the limitations of the EMR method. Note also that EMR is subtly different from finding the largest set of split (partitions) that are all mutually compatible. That will often be the same, but not always. The latter is called the Nelson Consensus Tree. Joe Joe Felsenstein j...@gs.washington.edu Department of Genome Sciences and Department of Biology, University of Washington, Box 355065, Seattle, WA 98195-5065 USA ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/
Re: [R-sig-phylo] Estimating marginal ancestral states under a non-reversible model of evolution.
Jordan Gault asked: > I would like to do a worked example for an unequal-rates model as well but my > understanding is that the re-rooting method is inappropriate for > non-reversible models of trait evolution. How are marginal ancestral states > estimated for non-reversible models of trait evolution? Why do "unequal rates" make the model nonreversible? I assume that by unequal rates you mean not rates different in different sites, but forward and backward rates different at a single site. If that is the case, the equilibrium frequencies of the two states become correspondingly different, and the model is still reversible. The tree can be rerooted at any interior node and the marginal ancestral states at that node found by the usual likelihood "pruning" algorithm. J.F. Joe Felsenstein j...@gs.washington.edu Department of Genome Sciences and Department of Biology, University of Washington, Box 355065, Seattle, WA 98195-5065 USA ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/
Re: [R-sig-phylo] estimate ancestral state with OUwie models
Let me add more warnings to Marguerite and Thomas's excellent responses. People may be tempted to infer ancestral states and then treat those inferences as data (and also to infer ancestral environments and then treat those inferences as data). In fact, I wonder whether that is not the main use people make of these inferences. But not only are those inferences very noisy, they are correlated with each other. So if you infer the ancestral state for the clade (Old World Monkeys, Apes) and also the ancestral state for the clade (New World Monkeys, (Old World Monkeys, Apes)) the two will typically not only be error-prone, but will also typically be subject to strongly correlated errors. Using them as data for further inferences is very dubious. It is better to figure out what your hypothesis is and then test it on the data from the tips of the tree, without the intermediate step of taking ancestral state inferences as observations. The popular science press in particular demands a fly-on-the-wall account of what happened in evolution, and giving them the ancestral state inferences as if they were known precisely is a mistake. Joe Joe Felsenstein j...@gs.washington.edu Department of Genome Sciences and Department of Biology, University of Washington, Box 355065, Seattle, WA 98195-5065 USA ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/
Re: [R-sig-phylo] Detection association between traits
Rafa Braga asked > > Does anybody know how to detect association between two traits while removing > the influence of the third one? I have three traits which are named a b and > c. I'm wanting to test the correlation of a and b while controling c as a > covariate. In other words, the correlation between trait a and b can be a > result of the correlation between b and c and I'm wanting to remove the > influence of c on b. This goes all the way back 100 years. Once you have inferred the variances and covariances of the three traits, you can compute the partial correlation between a and b. This is defined as the correlation between the residual of a when predicted by c, and the residual of b when predicted by c. When there is only one variable c, and when r(a,b) is the correlation of a with b, then the partial correlation is [ r(a,b) - r(a,c) r(b,c)] / [(sqrt(1 - r(a,c)^2) sqrt(1 - r(b,c)^2)] For likelihood ratio testing in a linear model, one could compare the likelihoods of models that did or did not have direct connection of a and b. RA Fisher also derived a distribution of the partial correlation coefficient. Joe Joe Felsenstein j...@gs.washington.edu Department of Genome Sciences and Department of Biology, University of Washington, Box 355065, Seattle, WA 98195-5065 USA > > > > Yours, > Rafa > > [[alternative HTML version deleted]] > > ___ > R-sig-phylo mailing list - R-sig-phylo@r-project.org > https://stat.ethz.ch/mailman/listinfo/r-sig-phylo > Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/ -- Joe Felsenstein j...@gs.washington.edu Department of Genome Sciences and Department of Biology, University of Washington, Box 355065, Seattle, WA 98195-5065 USA ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/
Re: [R-sig-phylo] Phylogenetic PCA and measurement error
Rafael and Liam -- > > So far as I know, there is currently no way to explicitly take into account > sampling error in computing principal components while also accounting for > the phylogeny. However, it is relatively straightforward to compute scores > for individuals from a PCA conducted on species means. > > This would look as follows (in which Xm is a matrix containing values for > species for each trait, and Xi is a matrix with the same number of columns > but containing values for individuals): > > pca<-phyl.pca(tree,Xm) > Si<-Xi%*%pca$Evec > > Then, if you have a separate vector containing species ID as a factor, you > could compute means and variances for each component by species. There are methods (not all implemented in R) for taking into account the within-species phenotypic covariation among individuals and also the evolutionary covariances between species (on a phylogeny). These include the method of Ives, Midford, and Garland (2007) and my own method (2008). The former assumes you know the within-species covariance matrix, the latter estimates it from the sampled individuals for each species. Both of these assume that the true within-species phenotypic covariance matrices are the same for all species. For my method, you can use Liam's package Rphylip to call my program Contrast if you also have PHYLIP installed. Once you infer these covariance matrices, those are the relevant sufficient statistics (if the distributions are multivariate normal). The PCA axes for either covariance matrix can then be computed from those, or the canonical variates axes, which are the principal components of the between-species covariation relative to the within-species covariation. You don't need to use the PCA machinery until after you estimate these two covariance matrices. Joe Joe Felsenstein j...@gs.washington.edu Department of Genome Sciences and Department of Biology, University of Washington, Box 355065, Seattle, WA 98195-5065 USA ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/
Re: [R-sig-phylo] phylogenetic signal with sample sizes but no standard errors
There is also my C program Contrast, which implements a method from a 2008 paper I wrote: Felsenstein, J. 2008. Comparative methods with sampling error and within-species variation: contrasts revisited and revised. American Naturalist 171: 713-725. This estimates the within-species covariances and the between-species evolutionary variances.It is not an R program but can be accessed through Liam Revell's package Rphylip, if you also have my (non-R) package PHYLIP installed. A pretty good (but not ML) estimate of the within-species phenotypic variance can be gotten by pooling the within-species sampling error. The harder part is using that to correct one's estimate of the covariances of the between-species change, which using ordinary methods will have some within-species variation mixed in. Joe Joe Felsenstein j...@gs.washington.edu Department of Genome Sciences and Department of Biology, University of Washington, Box 355065, Seattle, WA 98195-5065 USA ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/
Re: [R-sig-phylo] phylip file compression ratios
Jacob Berv noted: > > I noticed today that the compression ratio for an interleaved phylip file > (zip compressed) was about 84:1, (390MB uncompressed —> 4.6MB compressed) > whereas the compression ratio for the same data non-interleaved was a much > worse 3.4:1 (390 MB uncompressed —> 113.9 MB). Not knowing much about how zip > compression actually works - I thought this might be an interesting > observation for the group… Interleaved sequences have blocks of (say) 50 bases. Successive lines may repeat a whole block or nearly repeat it. I wonder whether that makes the interleaved format easier to compress. I would guess that the compressibility of interleaved sequences would be highest when the sequences are closely related. In that case there would be 50-base blocks of nearly identical sequences. With less closely related sequences the compressibility should be much lower. Joe Joe Felsenstein j...@gs.washington.edu Department of Genome Sciences and Department of Biology, University of Washington, Box 355065, Seattle, WA 98195-5065 USA ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/
Re: [R-sig-phylo] Average Aminoacid Identity tree with bootstrap support. Is it possible?
Salvador Espada Hinojosa -- The problem, of course, is that a random substitution on an interior branch of a tree increases or decreases the size of more than one distance in the distance matrix. The distances aren't independent in their statistical noise. So you can't just sample distances after the distance matrix is computed. J.F. Joe Felsenstein j...@gs.washington.edu Department of Genome Sciences and Department of Biology, University of Washington, Box 355065, Seattle, WA 98195-5065 USA ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/
Re: [R-sig-phylo] HKY GTR distances
Emmanuel wrote: > > There is no distance formula for HKY or GTR model. For GTR, Rodrı́guez et > al. developed a procedure to calculate a distance (also in Yang's 2006 > book). An example is given below with the woodmouse: > > matlog <- function(x) { > tmp <- eigen(X) > V <- tmp$vectors > U <- diag(log(tmp$values)) > V %*% U %*% solve(V) > } > > tr <- function(x) sum(diag(x)) > > data(woodmouse) > > PI <- diag(base.freq(woodmouse[1:2, ])) > Ft <- Ftab(woodmouse[1:2, ]) > F <- Ft/sum(Ft) > X <- solve(PI) %*% F > -tr(PI %*% matlog(X)) > > You have to call this code for each pair of sequences (or wrap it in a > function). > > For HKY, Yang mentioned a procedure Rzhetsky & Nei (1994, J Mol Evol). > Strictly speaking there *is* a formula for GTR, which I think may be equivalent to the above. The formula is given in my book "Inferring Phylogenies" on page 209 as equation 13.24. Note that for both of these, the equilibrium frequencies of the bases are estimated from the empirical frequencies in the two sequences. That means that for each distance, the equilibrium frequencies are somewhat different, as different sequences are being used to estimate the base frequencies. We are not, in the use of these formulas, estimating one overall set of equilibrium frequencies and then using that for all the the distances in our data set. The Rzhetsky-Nei 1994 distance formula is, I believe, an approximation, though probably a very good one. It is not quite the same as the estimate you would get by maximizing the likelihood. Distance formulas that involve empirically estimated base frequencies, which all of these do, have in common the problem that one either estimates those separately for each pair of sequences, or jointly estimates them from the whole dataset, without using a tree in the process. J.F. Joe Felsenstein j...@gs.washington.edu Department of Genome Sciences and Department of Biology, University of Washington, Box 355065, Seattle, WA 98195-5065 USA ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/
[R-sig-phylo] Fwd: Workshop: UWashington.EvolQuantGenetics.Jun5-9
Evolutionary Quantitative Genetics Workshop Friday Harbor Laboratories, University of Washington, 5-9 June 2017 Non-credit workshop. Participants arrive at Friday Harbor Labs on Sunday, June 4, lectures and exercises occur June 5-9, and participants depart on Saturday, June 10, 2017. Application deadline March 1, 2017. Application forms and details here: http://tinyurl.com/EQG2017 Web page: http://depts.washington.edu/fhl/studentSummer2017.html#SumB-genetics Instructors: Dr. Joe Felsenstein Department of Genome Sciences University of Washington, Seattle j...@gs.washington.edu Dr. Stevan J. Arnold Department of Integrative Biology Oregon State University, Corvallis stevan.arn...@oregonstate.edu Previous versions of this 5-day workshop were given at the National Evolutionary Synthesis Center (NESCENT) at Duke University in Durham, North Carolina from 2011-2013, and then at the National Institute for Mathematical and Biological Synthesis (NIMBioS) at the University of Tennessee in Knoxville from 2014-2016. Like past versions, the Friday Harbor version will review the basics of theory in the field of evolutionary quantitative genetics and its connections to evolution observed at various time scales. The aim of the workshop is to build a bridge between the traditionally separate disciplines of quantitative genetics and comparative methods. Quantitative genetic theory for natural populations was developed considerably in the period from 1970 to 1990 and up to the present, and it has been applied to a wide range of phenomena including the evolution of differences between the sexes, sexual preferences, life history traits, plasticity of traits, as well as the evolution of body size and other morphological measurements. Textbooks have not kept pace with these developments, and currently few universities offer courses in this subject aimed at evolutionary biologists. Evolutionary biologists need to understand this field because of the ability to collect large amounts of data by computer, the development of statistical methods for changes of traits on evolutionary trees and for changes in a single species through time, and the realization that quantitative characters will not soon be fully explained by genomics. This workshop aims to fill this need by reviewing basic aspects of theory and illustrating how that theory can be tested with data, both from single species and with multiple-species phylogenies. Participants will use R, an open-source statistical programming language, to build and test evolutionary models. The workshop involves lectures and in-class computer exercises. You can consult the 2016 tutorial website for examples, it will be found at http://www.nimbios.org/tutorials/TT_eqg2016 The intended participants for this tutorial are graduate students, post-docs, and junior faculty members in evolutionary biology. The workshop can accommodate up to 35 participants. Guest instructors will include: * Marguerite Butler, Biology, Univ. Hawai'i, Mānoa * Patrick Carter, Evolutionary Physiology, Washington State University, Pullman * Adam Jones, Biology, Texas A University, College Station * Brian O'Meara, Ecology & Evolutionary Biology, Univ. of Tennessee, Knoxville * Josef Uyeda, Biological Sciences, Virginia Tech, Blacksburg Cost: $1000 to be paid to Friday Harbor Laboratories. This fee will cover housing and meals at FHL and all other workshop expenses, except travel. Participants who have been admitted to attend will make their payment prior to arrival at FHL. Details of payment by credit card or check will be provided once the applicant has been admitted to attend. J.F. ---- Joe Felsenstein j...@gs.washington.edu Department of Genome Sciences and Department of Biology, University of Washington, Box 355065, Seattle, WA 98195-5065 USA ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/
Re: [R-sig-phylo] simulating continuous data
Bryan McLean -- > I’m working to simulate multiple continuous characters on a known > phylogeny (using several of the standard models), and I want to compare > properties of the simulated datasets to an empirical dataset. My question > is: what is the standard method for ensuring that those datasets > (simulated, empirical) are actually directly comparable, i.e. scaled > similarly? Does this involve specifying a sensible root state (e.g. > ancestral reconstruction) OR just rescaling one or the other datasets > before or after the analysis? Forgive me if this is a bit of a naive > question, just trying to get a sense of standard practices. > It would seem to depend on what you consider to be "scaled similarly". When I simulate multiple characters with correlated Brownian Motion, I specify a covariance matrix for the evolutionary changes, as well as a starting vector of means. Using a matrix square root of the covariance matrix, one can transform the characters so that the covariance matrix of the new characters is an identity matrix. Those are easy to simulate up the tree, and then one transforms back to the original characters. I do this with my own C programs, but it can be done in R too. But what does it mean to be "scaled similarly" ? Joe Joe Felsenstein j...@gs.washington.edu Department of Genome Sciences and Department of Biology, University of Washington, Box 355065, Seattle, WA 98195-5065 USA [[alternative HTML version deleted]] ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/
Re: [R-sig-phylo] BiSSE/MacroCAIC on Non-Ultrametric Tree with Polytomies?
Ted commented: > Nice shot across the bow! > Well, thanks but I have been shooting and shooting and shooting, with regard to issues like regressing one character on another and then analyzing only the residuals phylogenetically, and in this case issues like not thinking about how the environments changed along the tree. Not much effect so far so you can expect more shots across the bow. Joe ---- Joe Felsenstein j...@gs.washington.edu Department of Genome Sciences and Department of Biology, University of Washington, Box 355065, Seattle, WA 98195-5065 USA [[alternative HTML version deleted]] ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/
Re: [R-sig-phylo] BiSSE/MacroCAIC on Non-Ultrametric Tree with Polytomies?
Brian Gill -- > U between a discrete binary predictor > (Latitude: Colorado/Ecuador) and a continuous response (species richness). > > e at the influence of a discrete > binary trait on richness, but I'm not sure if my tree is suitable or if > there is a better approach. > > 1) Diversitree package BiSSE > 2) Caper package using MacroCAIC > > Any suggestions would be greatly appreciated- And what do these methods assume about how that "discrete binary predictor" evolves along the tree? Joe - j...@gs.washington.edu Joe Felsenstein, Department of Genome Sciences and Department of Biology Box 355065, University of Washington, Seattle, WA 98195-5065 [[alternative HTML version deleted]] ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/
Re: [R-sig-phylo] Testing for relationship between one categorical and one continuous variable in a phylogenetic framework.
Sean -- ... or, if you want to do it *really correctly*, you can use the threshold model of Sewall Wright for the discrete character and use the MCMC approach that I proposed in 2012: Felsenstein, J. 2012. A comparative method for both discrete and continuous characters using the threshold model. American Naturalist 179: 145-156. which is implemented in my program Threshml which can be called from Liam Revell's Phytools R package. It also works for multiple threshold characters and multiple continuous characters. Joe Joe Felsenstein j...@gs.washington.edu Department of Genome Sciences and Department of Biology, University of Washington, Box 355065, Seattle, WA 98195-5065 USA [[alternative HTML version deleted]] ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/
Re: [R-sig-phylo] Determining Order of Trait Evolution
Gavin Leighton asked: > > > > > I have 500 trees of 80 species downloaded from birdtree.org and am > > primarily interested in two traits. I have used PGLS to determine the > > traits are related but would ideally like to test if there is an order to > > trait evolution. To complicate matters one trait (Trait A) is continuous > > while the second (Trait B) is presence/absence. I was hoping someone > could > > direct me to methods (assuming they exist) that would allow me to > determine > > the estimated value of Trait A before a gain of Trait B evolves in a > > lineage. > > > > A superior model (if I do say so myself) is the threshold model of Sewall Wright (1934). I have described in a paper in American Naturalist in 2012 how to use it to model discrete traits on a phylogeny. It allows the case of one trait continuous and another discrete. It is implemented in my program Threshml, which should be callable from Liam Revell's "phytools" R package. However it does not precisely answer the question you posed, but just asks whether the two traits evolve in a correlated fashion. J.F. Joe Felsenstein j...@gs.washington.edu Department of Genome Sciences and Department of Biology, University of Washington, Box 355065, Seattle, WA 98195-5065 USA [[alternative HTML version deleted]] ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/
Re: [R-sig-phylo] [MORPHMET] model II regression statistics PAST
A few days ago, Jim Rohlf said, in response to Patrick Arnold's query about how to test the significance of major axis "regression" > > I don't know about how to do it in PAST, but note that because the slope of > the standard major axis "regression" line is just the signed ratio of two > standard deviations, its square follows the F-distribution if one assumes > that the two variances are based on random samples from two normally > distributed populations. Thus a test for a slope = 1 corresponds to a > 2-tailed test for the equality of variances. One can easily look up > probabilities in an F-table or compute confidence limits using standard > methods. Note that if the samples are across different populations or > species, as in many (most?) morphometric applications, then these assumptions > do not hold. This problem was brought to my attention by Paul Harvey in the late 1970s. I suggested that he look for a rotation angle (theta) that would maximize the likelihood under a model in which the two (new) variables are independent Gaussian variables. This also allows a likelihood ratio test of the assertion that theta is zero. The estimate of theta provides an estimate of the angle of the major axis. These can be easily generalized to multiple populations, even when their variances are unequal. The likelihood ratio test is done with a test for equality of correlation matrices which will be found in a multivariate statistics book by Morrison. I am not sure where Paul published this, but I think he did in an appendix to a paper of his in some multiauthor volume. J.F. Joe Felsenstein j...@gs.washington.edu Department of Genome Sciences and Department of Biology, University of Washington, Box 355065, Seattle, WA 98195-5065 USA ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/
Re: [R-sig-phylo] summary stats for comparative methods p-values
If the 100 trees are trees sampled from a Bayesian posterior, or else trees from bootstrap samples of your data, then you might just take the estimates from each tree (say estimates of a regression coefficient). Consider their distribution and ask whether the null hypothesis value (such as having slope 0) is in the tail of that histogram. The overall P value will be the proportion of estimated slopes that are below zero (unless you want to do a two-tailed test). Under the null hypothesis this will have the proper rejection probability. As you take more and more trees the power increases some, but the type I error rate stays the same. If the 100 trees are something else, such as the personal opinions of 100 of your friends, then there is no statistical justification for this. J.F. Joe Felsenstein j...@gs.washington.edu Department of Genome Sciences and Department of Biology, University of Washington, Box 355065, Seattle, WA 98195-5065 USA [[alternative HTML version deleted]] ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/
Re: [R-sig-phylo] Pairwise Distances
Pedro -- > Well, for now I just want to estimate the raw distances. But thinking about coalescents is certainly interesting. Would you have a suggestion for either ways of thinking? Well, you must be wanting to use those raw distances to infer something. Rates of migration? There are extensive discussions in the book by John Wakeley. Also in the elementary population genetics text by Nielsen and Slatkin. J.F. ---- Joe Felsenstein j...@gs.washington.edu Department of Genome Sciences and Department of Biology, University of Washington, Box 355065, Seattle, WA 98195-5065 USA [[alternative HTML version deleted]] ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/
Re: [R-sig-phylo] Pairwise Distances
Pedro Taucce -- > Is there a way to estimate pairwise distances between and within groups of > sequences (in my case, each group is a species with lots of individuals)? I > used to do it with MEGA, but now I use Linux only and MEGA isn't getting > along with it. > > The closest way I figured out is the function dist.dna from the ape > package. But I think it does not estimate distances between groups. > You want to use distances between groups? But you don't want to think about coalescents? J.F. Joe Felsenstein j...@gs.washington.edu Department of Genome Sciences and Department of Biology, University of Washington, Box 355065, Seattle, WA 98195-5065 USA [[alternative HTML version deleted]] ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/
Re: [R-sig-phylo] Accounting for phylogeny in binary predictor, binary response data
I do not know offhand whether there is an R implementation, but how about Mark Pagel's 1994 method for testing whether two 0/1 characters changing along a ohylogeny are changing independently? J.F. - j...@gs.washington.edu Joe Felsenstein, Department of Genome Sciences and Department of Biology Box 355065, University of Washington, Seattle, WA 98195-5065 [[alternative HTML version deleted]] ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/
Re: [R-sig-phylo] Accounting for phylogeny in binary predictor, binary response data
I have received phishing spam through the R-sig-phylo mailing list (pretending I had expressed interest in renting something from them) from anya_phill...@casetotours.xyz disguised as a reply to my comment. So that address should immediately be removed from the list. Joe - j...@gs.washington.edu Joe Felsenstein, Department of Genome Sciences and Department of Biology Box 355065, University of Washington, Seattle, WA 98195-5065 [[alternative HTML version deleted]] ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/
Re: [R-sig-phylo] testing for variation in rates of evolution among traits
Warning: You'd have to ensure that the traits for which you are comparing rates are evolving independently, so that they do not covary in their evolutionary changes. I assume Dean Adams's paper involves some way of coping with this. The issue of log-transforms that Ted raised is very important, otherwise big measurements will tend have higher rates of evolution. Joe j...@gs.washington.edu [[alternative HTML version deleted]] ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/
Re: [R-sig-phylo] [MORPHMET] Re: Stability of p-values (physignal and testing for morphological integration)
Alberto Gallano asked: Would this also be the case in the situation where n is small enough (~ 15) to enumerate all possible unique permutations? I was under the impression that such an 'exact' test provided the true p-value without error. In a case that small, one might be able to evaluate all permutations. (There are 1.3 x 10^12 permutations, but maybe you need combinations, which are fewer). Yes, you would then have an exact P value. But of course that is not the same as an infinitely powerful test -- after all an ordinary t-test gets you an exact P value when its assumptions are met. J.F. Joe Felsenstein j...@gs.washington.edu Department of Genome Sciences and Department of Biology, University of Washington, Box 355065, Seattle, WA 98195-5065 USA ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/
Re: [R-sig-phylo] [MORPHMET] Re: Stability of p-values (physignal and testing for morphological integration)
A number of people have suggested that P values should stabilize after a number of samples (in a permutation test) that depends on the data set. I suspect that these were unintended misstatements. As Dennis Slice has mentioned, one can regard each permutation in the permutation test as a random sample from a distribution. Comparing a test statistic X to its value in the data (say, Y), each permutation draws from a distribution in which there is a probability P that X exceeds Y. So each permutation is (to good approximation) a coin toss with probability P of Heads. There obviously no number of tosses beyond which the fraction of Heads stabilizes. The fraction of heads after N tosses will depart from the true value P by an amount which has expectation 0, and variance P(1-P)/N. This is a fairly slow approach of the fraction of Heads to the true value. So to get twice as close to the true P value, one needs 4 times as many permutations. And this need for more and more samples continues indefinitely. There is no sudden change as one reaches a threshold number of permutations. But that's what you really meant, right? Joe --- Joe Felsenstein j...@gs.washington.edu [[alternative HTML version deleted]] ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/
Re: [R-sig-phylo] ancestor vs. change plots
Milton Tan asked: I have a question that is perhaps esoteric, since it's on a method I don't see used often. I am looking at the dynamics of body size evolution, and have come upon ancestor-vs-change plots described in Alroy 2000 (Understanding the dynamics of evolutionary trends, Paleobiology). This is interesting because it will allow me to see if rate of body size change depends on body size. I haven't seen this method widely used, so for anyone unaware how this works: for each branch, you plot the ancestral state vs. the amount/rate of change along the branch. If the question were just whether the rate of change of the character (body size) depended on its value, then another way would be to look for a transformation of body size that made the instantaneous variance of change constant. i.e., does change in log(size) or in sqrt(size) have constant variance? There are parameterized transformation such as y = (x^p - x^{-p})/(2p) or elsey = ( x^p - 1) / p + 1 for which you could estimate the parameter p by ML. Joe Joe Felsenstein j...@gs.washington.edu Department of Genome Sciences and Department of Biology, University of Washington, Box 355065, Seattle, WA 98195-5065 USA ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/
Re: [R-sig-phylo] Sister-clade analysis of discrete characters
I mentioned earlier the alternative of having two-state discrete characters each modeled by an underlying quantitative character with a developmental threshold. This can in principle be extended to three or more states. In fact, Sewall Wright's original example in 1934 had three states. But there are two difficulties: 1. How far away from each other the two thresholds would be becomes something that needs to be estimated. If there are only two states, then since the underlying scale is not directly observed, one can place the threshold anywhere on the underlying liability scale, most conveniently at zero. But how far from that the second, third, (etc.), thresholds should be placed then needs estimation as parameters of the model. 2. It is also not obvious whether they third state uses the same underlying scale. For three states, one could have two underlying liability scales, with each state determined by some region in this two-dimensional space. Inferring that would be a continuous-state counterpart to Mary Mickevich's transformation series analysis of 1982 (see my book, page 81 for references). So thus far the threshold model implementations do not allow three or more states. But the possibilty should be mentioned here. Joe Felsenstein j...@gs.washington.edu Department of Genome Sciences and Department of Biology, University of Washington, Box 355065, Seattle, WA 98195-5065 USA ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/
Re: [R-sig-phylo] Sister-clade analysis of discrete characters
This is not a sister-clade method but ... If you are happy with a model in which there are underlying polygenic quantitative characters, covarying ones whose evolution is modeled by Brownian Motion model, and some two-state discrete characters arise by imposing a developmental threshold on each quantitative character, you might look at my 2012 paper in American Naturalist: http://www.jstor.org/stable/10.1086/663681 The method is designed for use with a known phylogeny and infers the evolutionary covariances of the underlying quantitative characters (the liabilities). My C program Threshml analysis this model by MCMC sampling. It can, I believe, be called by Liam Revell's R package phytools. This threshold model, a phylogenetic version of one which originated in 1934 with Sewall Wright, is mentioned in the Maddison and FitzJohn paper that William Gearty cited. It is just briefly mentioned by them, I gather because the authors forgot about it until the last minute when writing the review. Nevertheless it is getting increasing use. Joe Joe Felsenstein j...@gs.washington.edu Department of Genome Sciences and Department of Biology, University of Washington, Box 355065, Seattle, WA 98195-5065 USA [[alternative HTML version deleted]] ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/
Re: [R-sig-phylo] multipage pdf of a huge tree
Jonas Eberle wrote: thanks a lot! I didn't knew splitplotTree yet. Great function! However, my tree has several thousands of tips (yes, it's a bit crazy but unfortunately necessary...) and I guess it's only possible to split it on two pages with splitplotTree. Or am I missing something? It's not in R (unless available through Liam Revell's phytools package), but in PHYLIP the tree-drawing programs Drawgram and Drawtree have the capability of splitting a plot into a rectangular array of plots, and putting these out onto separate files (not PDFs, but Postscript is possible). This was intended to help people make large posters using printers that can only do a single page. However ... I do not see why this is necessary. Most tree-drawing programs should be able to write a file that has the large tree plotted on it. If you don't want to print the resulting tree on paper, it would then be possible to view the tree in an application such as Adobe Acrobat Reader and zoom in on it and see the tiny branches and their labels. Making multiple plots for one tree would probably confuse the matter. Or is there something I am missing here? J.F. Joe Felsenstein j...@gs.washington.edu Department of Genome Sciences and Department of Biology, University of Washington, Box 355065, Seattle, WA 98195-5065 USA [[alternative HTML version deleted]] ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/
Re: [R-sig-phylo] multipage pdf of a huge tree
Jonas Eberle -- thank you for your answer! Actually, I used to make large size pdfs with a tiny font size in these cases. It is then even possible to print this single-page-pdf on multiple pages in Acrobat Reader. The problem was that my current tree is really huge - with about 23000 tips. This took me to the limits of this approach, since pdf size is limited to 200 inch by Adobe and there also seems to be a lower limit for font size in pdfs (at least during export form R?). I didn't knew that Drawtree is able to split the tree over multiple pages. I guess this is an option in the command line version? I will check that out. In the mean time I found a way to do the job in R (see my post before) that seems to produce reasonable results. PHYLIP programs do not enter settings on the command-line. They have a menu. The Drawgram and Drawtree programs have a choice that gives a submenu in which multiple pages can be selected. This submenu is not available in the GUI (Java front end) version of those programs, but is available in the character-mode menu that appears if the program is run from the command line. It is the selection made by typing the character #. J.F. Joe Felsenstein j...@gs.washington.edu Department of Genome Sciences and Department of Biology, University of Washington, Box 355065, Seattle, WA 98195-5065 USA [[alternative HTML version deleted]] ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/
Re: [R-sig-phylo] Phylogenetic analysis with sequencing error
George Shireff asked: I was wondering if there is a way to incorporate sequencing error in phylogenetic analysis, if I know there is sequencing error but it's randomly distributed. I imagine this could be as simple as setting the probability of the observed base at each tip to 0.99 rather than 1, with a probability of 0.01/3 rather than 0 for each of the other tips (for a sequencing error of 1%). I was hoping that something like pml (phangorn) can be modified to do that, but I haven't been able to figure it out myself yet. How to do this in likelhood computations (and thus in either likelihood or Bayesian methods) is explained in my book, on pages 255-266 in the subsection of Chapter 16 called Handling ambiguity and error. I think that this is the first published treatment of that. My colleagues Mary Kuhner and Jim McGill published a paper in 2014 on this, with simulations showing that it helps to model the sequencing error: Kuhner MK, McGill J Correcting for sequencing error in maximum likelihood phylogeny inference. G3 (Bethesda). 2014 Nov 4;4(12):2545-52. doi: 10.1534/g3.114.014365. I believe that one major phylogeny program has very recently added code to model this. We have a version of Dnaml for PHYLIP that can do it, and hope to release it soon. As George Shireff saw, it is actually easy to do, and the computations are not any slower. J.F. Joe Felsenstein j...@gs.washington.edu Department of Genome Sciences and Department of Biology, University of Washington, Box 355065, Seattle, WA 98195-5065 USA [[alternative HTML version deleted]] ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/
Re: [R-sig-phylo] Phylogenetic analysis with sequencing error
Brian O'Meara added: Something like that should work. Unlike the case of ambiguity, where if you see, say, a Y at a site you'd give the probability of a C or a T at that site each 1, not 0.5 (Felsenstein's book has a good discussion of this) I think you're right that you'd want the probabilities to add up to one across bases: with sequencing error, the probability of it being A in the sequence given that it truly is A in the species is 99%. I poked around phangorn and didn't see an easy way to do this, but it should be possible (might require going into the C code to do this). To emphasize what Brian said: the probabilities at a site, in this computation, are *not* the probabilities of alternative outcomes. They are the probabilities of one outcome given four different underlying situations. Thus they don't have to add up to 1. For the ambiguity case, in my book I warned: You may be tempted to make the quantities (1/4, 1/4. 1/4, 1/4), but you should resist this temptation. For simple models of sequencing error the four numbers can add up to 1. Can, but don't have to. If an A is more likely to be misread than a G (say), then you get numbers that don't add up to 1, and that's OK, you should not worry about that, because they aren't the probabilities of four different outcomes. If we made up a 4 x 4 table where the first row shows the probabilities of the four observations given that the base really is A, and second row is the probabilities given that it really is G, etc., then the four quantities you want at one site at one tip of the tree are the four numbers in a column. (Which column depends on what you observed). The rows have to add to 1; the columns don't have to. J.F. Joe Felsenstein j...@gs.washington.edu Department of Genome Sciences and Department of Biology, University of Washington, Box 355065, Seattle, WA 98195-5065 USA [[alternative HTML version deleted]] ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/
Re: [R-sig-phylo] hierarchical model with phylogenetic dependence term
This question was asked by Peter Smits and by Edwin Lebrija Trejos. In Peter Smit's question it was this: I have a similar question to Edwin. I too am working with a hierarchical Bayesian model, though I've implemented it in stan. I've included a phylogenetic random effect term, which is modeled as being distributed as multivariate normal with known covariance matrix up to a constant, sigma_{phy}. This follows Lynch '91 Am Nat and Housworth et al. '04 Am Nat by drawing on the similarity with the animal model from quantitative genetics. My question is about the scaling of the covariance matrix: is it necessary to have the the diagonal terms satisfy x = 1 and all the off diagonals to be 0 = x 1? I have a time scaled phylogeny from which I have my covariance matrix, so currently all elements of the matrix are not scaled so that the greatest distance from the root to a tip is 1. Currently, the elements of the matrix are just the sum of the shared branch lengths. Is this appropriate? Why or why not? I hope people will correct me on this, but my take is: 1. To infer a variance term for the phylogenetic random term one has to scale that term somehow, and then the variance will specify how many of those scaling units are in this term. If you had the tree depth be 2 instead of 1, the variance inferred would then be half as great, because it would be saying how many multiples of 2 rather than how many multiples of 1. 2. I have not had time to read through all of Edwin's code to see exactly what the model is. However, note that there is a distinction between individual effects that are on a whole species, and individual effects that are on a single sampled individual. The latter are taking into account that the mean phenotype of each species is only known from a finite sample of individuals. So it is taking into account within-species phhenotypic variation and finiteness of sample sizes. The relevant methods there are by Ives, Midford and Garland (2007) and by me (2008). Ives et al. assume within-species phenotypic variances and covariances are known, I give methods for inferring them. The methods of Lynch and Housworth are in effect either assuming a sample size of 1 for each species, or are considering their individual effect to be on the whole species. Within-species phenotypic variation can be a substantial problem, as Ricklefs and Starck (1996) noted. In their example, the largest contrasts were between closely-related species. They suspected that this was due to within-species phenotypic variation (which shows up as variance due to sampling of the individual specimens measured). It causes trouble because the small branch lengths between closely-related species are an inadequate predictor of how different the species means will be. In effect, the model is wrong so some of the changes attributed to between-species evolution are actually within-species sampling variation (phenotypic variance). Joe Joe Felsenstein j...@gs.washington.edu Department of Genome Sciences and Department of Biology, University of Washington, Box 355065, Seattle, WA 98195-5065 USA [[alternative HTML version deleted]] ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/
Re: [R-sig-phylo] Midpoint rooting routine?
Liam said: Joe's correct that this is not yet in our project to create an R interface for PHYLIP (Rphylip, https://github.com/liamrevell/Rphylip). As soon as I get a chance to work on this project today, I will add it. That may be hard to do, as Retree is interactive and has two levels of menus. Joe Joe Felsenstein j...@gs.washington.edu Department of Genome Sciences and Department of Biology, University of Washington, Box 355065, Seattle, WA 98195-5065 USA ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/
Re: [R-sig-phylo] Sampling from all bifurcating and multifurcating trees
Eduardo Ascarrunz ear...@gmail.com wrote: Also, I figured out I could work with unrooted trees. I suspect the procedure would be similar to your rooted method, wouldn't it? There is a 1-1 correspondence between n-species rooted tree topologies and (n+1)-species unrooted tree topologies (this is discussed in Chapter 3 of my book). So you can simply generate a bifurcating rooted tree, or a multifurcating rooted tree, of n-1 tips, and then unroot it, making the previous root now be species n. Then you get a randomly sampled unrooted tree. This works for tree topologies but not, I think, for labeled histories. Joe Joe Felsenstein j...@gs.washington.edu Department of Genome Sciences and Department of Biology, University of Washington, Box 355065, Seattle, WA 98195-5065 USA ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/
Re: [R-sig-phylo] R help generating a heat map
One can sample from all possible rooted labeled histories by randomly bifurcating lineages, and at the end assigning the names randomly to the tips. Note that a labeled history is not the same as a tree topology. One can sample randomly from all topologies of rooted bifurcating trees with labeled tips by adding the species one by one to a one-species tree, each time choosing to split them off from one of the branches. The enumeration of all such trees is based on counting all possible ways of doing this, so choosing one such sequence should choose one of them at random. Jim Rohlf has a paper with an algorithm equivalent to this in (I think) Systematic Zoology but I cannot locate it at the moment. For rooted multifurcating trees a similar method could be used. In my 1978 paper in Systematic Zoology I showed that each such tree corresponded to way adding the species 1 to n in order, where at each step we choose equiprobably from among all branches and all interior nodes (so that if there are 13 branches and 6 interior nodes we split off one of these 19 at random). The enumeration algorithm for which this is based is also discussed in detail in Chapter 3 of my book. Joe On Mon, Dec 23, 2013 at 8:57 PM, Eduardo Ascarrunz ear...@gmail.com wrote: Hi Liam, Thank you! That looks clever. How does this method bias the sampling? I think it could be useful for test running my code anyway. Looking forward your findings, and all the best, E. On 24 Dec 2013 04:49, Liam J. Revell liam.rev...@umb.edu wrote: Hi Eduardo. You could try something like this: randomFurcTrees-function(n,N=1){ foo-function(n){ t-di2multi(rtree(n,br=sample(c(0,1), size=2*(n-2),replace=TRUE),rooted=FALSE)) t$edge.length-NULL t } trees-if(N1) replicate(N,foo(12),simplify=FALSE) else foo(n) if(N1) class(trees)-multiPhylo return(trees) } What this does is it generates random bifurcating topologies (using rtree in ape) with branch lengths that can be 0 or 1; then it collapses all zero length branches to polytomies. This is (demonstrably) *not* the same as picking trees at random from the set of all bi- and multifurcating topologies. It's not immediately obvious how you could do that, but I'll think about it. All the best, Liam Liam J. Revell, Assistant Professor of Biology University of Massachusetts Boston web: http://faculty.umb.edu/liam.revell/ email: liam.rev...@umb.edu blog: http://blog.phytools.org On 12/23/2013 10:15 PM, Eduardo Ascarrunz wrote: Hello everyone, Newbie here. I'm looking for a way to generate random trees of N tips, allowing multifurcations. My N would be 12, so it wouldn't be practical (nor possible) to work with all the possible trees (cf. allFurcTrees). I'd be happy with a set of 1000 trees, sampled equiprobably. I'd much appreciate any help. Best to all, E. [[alternative HTML version deleted]] ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r- sig-ph...@r-project.org/ [[alternative HTML version deleted]] ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/ -- Joe Felsenstein j...@gs.washington.edu Department of Genome Sciences and Department of Biology, University of Washington, Box 355065, Seattle, WA 98195-5065 USA [[alternative HTML version deleted]] ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/
Re: [R-sig-phylo] Transforming data for OU model
Ted Garland wrote: Our programs are in DOS (What's DOS? The last operating system that worked ...). However, I think you can also do this in R somewhere, maybe phytools? Liam Revell, want to jump in here? Just to remind people: DOS (also called MSDOS) executables programs can be run in Windows using the Command Prompt tool, which you will find in the Accessories menu that is found in the menu opened by the All Programs tab in the Start Menu. Joe Joe Felsenstein, j...@gs.washington.edu Dept. of Genome Sciences, Univ. of Washington Box 355065, Seattle, WA 98195-5065 USA ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/
Re: [R-sig-phylo] PGLS vs lm
Tom Schoenemann asked me: With respect to your crankiness, is this the paper by Hansen that you are referring to?: Bartoszek, K., Pienaar, J., Mostad, P., Andersson, S., Hansen, T. F. (2012). A phylogenetic comparative method for studying multivariate adaptation. Journal of Theoretical Biology, 314(0), 204-215. I wrote Bartoszek to see if I could get his R code to try the method mentioned in there. If I can figure out how to apply it to my data, that will be great. I agree that it is clearly a mistake to assume one variable is responding evolutionarily only to the current value of the other (predictor variables). I'm glad to hear that *somebody* here thinks it is a mistake (because it really is). I keep mentioning it here, and Hansen has published extensively on it, but everyone keeps saying Well, my friend used it, and he got tenure, so it must be OK. The paper I saw was this one: Hansen, Thomas F Bartoszek, Krzysztof (2012). Interpreting the evolutionary regression: The interplay between observational and biological errors in phylogenetic comparative studies. Systematic Biology 61 (3): 413-425. ISSN 1063-5157. J.F. Joe Felsenstein j...@gs.washington.edu Department of Genome Sciences and Department of Biology, University of Washington, Box 355065, Seattle, WA 98195-5065 USA ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/
Re: [R-sig-phylo] question about measurement error in phylogenetic signal (Krzysztof Bartoszek)
In addition to the references to papers by Hansen and Bartoszek, and by Ives, Midford and Garland, I would biasedly suggest this paper: Felsenstein, J. 2008. Comparative methods with sampling error and within-species variation: contrasts revisited and revised. American Naturalist 171: 713-725. The method estimates the within-species phenotypic variation (which, when you are analysing species means is the relevamt measurement error and also includes actual measurement error) and corrects for it. The software announced there is not in R, but I believe that Liam Revell's phytools package can call our program. Joe Joe Felsenstein j...@gs.washington.edu Department of Genome Sciences and Department of Biology, University of Washington, Box 355065, Seattle, WA 98195-5065 USA ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/
Re: [R-sig-phylo] question about analysing trait differences
Brian O'Meara wrote: The problem is reconstructing overlap down the tree (it's possible to do, but whether it's possible to do well is another question). One thing you could do to avoid this is to use the *other* method from Felsenstein's 1985 independent contrasts paper, taking pairs of species independent of other pairs. A way to do this: 1) Get a clade of size two (aka a cherry sensu Semple and Steel). Every tree has at least one. 2) Difference in overlap and diference in mass for the pair of taxa making up this clade becomes one point (perhaps standardize for time, and positivize it such that you subtract them in the order that leaves the mass difference positive). 3) Prune these two taxa off the tree. 4) Go back to step 1. When you're done, assuming no polytomies, you'll have zero or one species left. [it's possible to do with polytomies, too: if assumed to be hard, then just do a pair of taxa from a terminal polytomy and leave the rest for the next step. If assumed to be soft, then just take two from a terminal polytomy and discard the remaining taxa] Contrasts gives you Ntax - 1 contrasts. This gives you floor(Ntax/ 2) but avoids having to do reconstructions. I have code around to do this but can't find it at the moment, but it's not hard to write -- post again if you need help with it and no better ideas have been proposed. Note that this method (which goes all the way back to Salisbury in 1942 as mentioned in my 2004 book) requires that in step 3 prune does not mean using the pruning algorithm of likelihood evaluation. It means removing these two tips (and their shared most recent common ancestor), leaving behind nothing. So not calculating and adding to the tree any phenotypes for the most recent common ancestor. Thus if we find (chimp, bonobo) to be the cherry we remove them, leaving a tree such as (macacque, (gibbon, (orang, gorilla))); so now (orang, gorilla) is a cherry. Joe Joe Felsenstein, j...@gs.washington.edu Dept. of Genome Sciences, Univ. of Washington Box 355065, Seattle, WA 98195-5065 USA ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/
Re: [R-sig-phylo] question about analysing trait differences
I wrote: Thus if we find (chimp, bonobo) to be the cherry we remove them, leaving a tree such as (macacque, (gibbon, (orang, gorilla))); so now (orang, gorilla) is a cherry. but should have written ...leaving a tree such as ((macacque, (gibbon, (orang, (gorilla, human; so now (gorilla, human) is a cherry. Joe Joe Felsenstein, j...@gs.washington.edu Dept. of Genome Sciences, Univ. of Washington Box 355065, Seattle, WA 98195-5065 USA ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/
Re: [R-sig-phylo] statistically test whether two characters evolve dependently
New Bio wrote: Thanks, Liam. good to know. I think extending the tools to analyze traits of ordered/unordered multistates can be very useful. There are many interesting traits such as oxygent requirement (anaerobic, facultative, aerobic) of microbes, which is ordered multistates, and habitats (water, air, soil), which is unordered. For approaches like the threshold model, it is not simple to see how to handle multistate characters. Should we assume one scale? If so, how far from the 0/1 threshold should we place the 1/2 threshold? That becomes an additional parameter to be estimated. Or should we have an additional axis, so that 0/1 is on the x axis, while [01]/2 is on another axis? And then what do we do about state 3 if it also exists? That way lies madness ... or perhaps a good Ph.D. thesis topic. (This is in some way related to Transformation Series Analysis, which was an issue with parsimony methods. One imagined one's states arranged in a character transformation series which could even be a tree, the Character State Tree. Then one wanted to infer the phylogeny and at the same time also the CST, using parsimony as the criterion. In a sense what I am raising is the likelihood and modeling equivalent problem.) Joe Joe Felsenstein j...@gs.washington.edu Department of Genome Sciences and Department of Biology, University of Washington, Box 355065, Seattle, WA 98195-5065 USA [[alternative HTML version deleted]] ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/
Re: [R-sig-phylo] Seeking list of nucleotide substitution models
Emmanuel Paradis answered Daniel Barker's inquiry: Is there a review or list of ~every specific nucleotide substitution model that has been proposed or used in the literature (with references)? Hi Daniel, Have you looked at the help page of dist.dna? The list of references there is focused on calculating distances but some are general. There are a few more references in my book too. I suggest you also look at Inferring Phylogenies. where you will find, in the section on Other distances in Chapter 13 that Andrey Zharkikh had a very good paper in 1994 covering many of the distances that had been invented up to that date, and explaining relationships between them: Zharkikh, A. 1994. Estimation of evolutionary distances between nucleotide sequences. Journal of Molecular Evolution 39: 315.329 Joe Joe Felsenstein j...@gs.washington.edu Department of Genome Sciences and Department of Biology, University of Washington, Box 355065, Seattle, WA 98195-5065 USA ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/
Re: [R-sig-phylo] testing for correlates of rates of evolution
Rob Lanfear wrote: In particular, Liam's using squared contrasts in y, so that's asking whether the absolute size of changes in y depends on x at the ancestral node. I might have missed something here, but that sounds very similar in principle to Freckleton's test of whether the variance of trait differences is unrelated to their absolute values [1], except that the latter looks for correlations between the absolute value of differences in x versus x at the ancestral node. It might be useful to consult that paper [1] to get some more ideas for how to interpret those kinds of results. If there is proportionality between standard deviation of change and the value of the character, that is essentially saying that the log(x) changes at rate independent of its value. Similarly, if the variance of change is proportional to the value of the character, that is essentially the same as saying that the square root of the character changes at a rate independent of its value. See this Wikipedia page: http://en.wikipedia.org/wiki/Variance-stabilizing_transformation Perhaps the whole test can be done by considering different transforms of the data (there are parameterized families of them) and use the likelihood to test values of the transform parameters (with appropriate correction of the likelihood by having a Jacobian term). Joe Joe Felsenstein j...@gs.washington.edu Department of Genome Sciences and Department of Biology, University of Washington, Box 355065, Seattle, WA 98195-5065 USA ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/
Re: [R-sig-phylo] From ClustalW2 Tree to Heat Map in R
Klaus Schliep wrote: There is quite some irony that in phylogenetic reconstruction often non-ultrametric methods are preferred, even though the time to the last common ancestor (LCA) should be for each extend species the same. However other fields use heavily ultrametric methods (hclust) even there exist no evident equivalent property like the LCA. The irony, such as it is, is due to the fact that we can only see amounts of difference, not amounts of time, and, darn it, different organisms have different biologies so they change at different rates. Which makes biology more interesting but makes molecular change less clocklike. Joe Joe Felsenstein j...@gs.washington.edu Department of Genome Sciences and Department of Biology, University of Washington, Box 355065, Seattle, WA 98195-5065 USA [[alternative HTML version deleted]] ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/
Re: [R-sig-phylo] Why no branch lengths on consensus trees?
Emmanuel Paradis wrote: If you are Bayesian, the trees sampled from an MCMC are here for estimation including of the branch lengths, so you use them to compute some sort of consensus topology as well as its branch lengths. So it makes sense that MrBayes can do a consensus tree with branch lengths. I endorse the rest of Emmanuel's advice but let me quibble with this one. The posterior on trees may not consist mostly of trees varying around a single consensus. If the posterior had, for example, two modes, each centered around a different tree, a single consensus tree might not be appropriate, and branch lengths computed by averaging lengths over the two modes might not be a good guide to what the trees in the posterior looked like. I don't know enough about MrBayes features to know whether they have some way around this. There is a similar issue with parsimony methods -- the set of most parsimonious trees may have a consensus, which may well not be a most parsimonious tree. People who see the consensus of most parsimonious trees may not realize that the particular tree they are looking at is not most parsimonious. J.F. Joe Felsenstein j...@gs.washington.edu Department of Genome Sciences and Department of Biology, University of Washington, Box 355065, Seattle, WA 98195-5065 USA (from 1 October 2012 to 10 December 2012 on sabbatical leave at) Department of Statistics, University of California, Berkeley, 367 Evans Hall, Berkeley, CA 94710 ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Re: [R-sig-phylo] Cluster tips into OTUs
Miguel Verdu -- I want to perform a mothur-like OTU-based approach, but based on phylogenetic instead of DNA sequence distances. Is there any way to cluster tips of a tree into OTUs determined by a threshold of phylogenetic distances?. In other words, I want to collapse all the tips connected with distances less than X into the same OTUs. I don't know which R program will do this (probably just a one-line if command) but the obvious method would be to take the distance matrix and reduce all the elements of it that are less than X to 0.000. Then run any distance method. J.F. Joe Felsenstein j...@gs.washington.edu Department of Genome Sciences and Department of Biology, University of Washington, Box 355065, Seattle, WA 98195-5065 USA (from 1 October 2012 to 10 December 2012 on sabbatical leave at) Department of Statistics, University of California, Berkeley, 367 Evans Hall, Berkeley, CA 94710 [[alternative HTML version deleted]] ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Re: [R-sig-phylo] asymmetric transitions
Daniel Barker wrote: On ancestral state reconstructions. I've recently started using Ziheng Yang's terminology - of referring to reconstructions, derived from ML transition rates and equilibrium frequencies, as 'empirical Bayes' reconstructions. I believe this to be the most useful way to describe these methods. My question. Were does empirical Bayes stand, w.r.t. the Likelihood Principle? Empirical Bayes seems beyond the scope of the Likelihood Principle, or in violation of it. The biological hypotheses (here, of ancestral state) are not expressed as parameters of the model, so the relative support is not judged by a likelihood ratio. On the other hand, by only using the data at hand, empirical Bayes does comply with 'the irrelevance of outcomes not actually observed'. If anyone can provide or point to some thoughts on this, I'd be very grateful - for ancestral states or in general. It has long been recognized -- since Anthony Edwards's 1970 paper and Elizabeth Thompson's brilliant 1975 thesis and book -- that the interior node reconstructions are not, strictly speaking, MLEs. They are maximum posterior probability estimates instead. The root ancestral states can be considered MLEs if that state is one of the parameters of the model. If instead (as often done for DNA and protein data) the root ancestral state is considered to be drawn from the equilibrium distribution of the stochastic process, then they too are MPP estimates. I would only call them empirical Bayes estimators if one made an ML estimate of some stuff (the whole tree topology, which sounds like an extreme case) and then, assuming the correctness of that estimate, the ancestral states are inferred by being Bayesian. In that case probably the only thing that would be taken to have a prior on it would be the ancestral state. Then you could take the MPP as the modal estimate from the posterior. So, if I have understood his point correctly, Ziheng is formally correct, but it is a case where one is not being very Bayesian. For my money, you are a Bayesian not just if you use a prior, but if you are willing to use a controversial prior. And in this case the prior is pretty uncontroversial. Joe Joe Felsenstein j...@gs.washington.edu Department of Genome Sciences and Department of Biology, University of Washington, Box 355065, Seattle, WA 98195-5065 USA [[alternative HTML version deleted]] ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Re: [R-sig-phylo] asymmetric transitions
Mark Holder wrote: With respect to Jarrod's simulations, I have a few thoughts: 1. I don't understand the claim (in the original email) that its fairly straightforward to prove that asymmetric transition rates cannot be identified using data collected on the tips of a phylogeny It seems like this is something that is routinely done in phylogenetics, and proofs of identifiability of GTR exist (demonstrating that this indeed feasible and not some computational artifact). I agree. This whole discussion may have started from a fact that is *not* fairly straightforward to prove. In which case it is not necessary to bring speciation rates or priors into it. A reversible two-state model should be able to have its parameters estimated on a given tree, clocklike or not. Joe Joe Felsenstein, j...@gs.washington.edu Dept. of Genome Sciences, Univ. of Washington Box 355065, Seattle, WA 98195-5065 USA ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Re: [R-sig-phylo] simulate traits evolution in correlated with body mass
Liam Revell wrote: library(phytools) y-r*x+sqrt(1-r^2)*fastBM(tree,sig2=mean(pic(x,tree)^2)) This should give you the correlation r on average and the y|x should be Brownian. (If x is Brownian, then both y x y|x will be too.) If y is evolving in response to x, and x is changing too, then wouldn't the expectation of y be a value that is predicted, not by the current x, but to some extent also from past x's? One would want some much more specific model. *** whining on *** This is part of my stream of oft-repeated, widely-ignored complaints that one ought not regress present-day y's on present-day x's. Lots of people are doing it -- and they're all wrong. *** whining off *** Joe Joe Felsenstein j...@gs.washington.edu Dept of Genome Sciences and Dept of Biology, Univ. of Washington, Box 5065, Seattle Wa 98195-5065 ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Re: [R-sig-phylo] LL ratio test
Ben Bolker -- Now we can apply Fisher's Likelihood Ratio Test of Fisher as well as the AIC. The LRT tells us that the expectation ... under the null hypothesis where M0 (the simpler model) is true? Yup. I'm considering that case to see how the AIC fits in with the LRT. Of course the AIC is proposed for a much wider range of cases. of 2 log(L1) - 2 log(L0) is d1 - d0 (because it is distributed as a Chi-Square with that number of degrees of freedom). But the AIC tells us that the expectation is 2(d1 - d0). Maybe I'm missing something, but I don't see how the AIC tells us something about the expectation of 2 log(L1) - 2 log(L0) ? It gives us the expectation of the Kullback-Leibler distance, which is something like sum(p(i) log(p(i)/q(i)) where p(i) is the true distribution of outcomes and q(i) is the predicted distribution of outcomes ... so it's something more like a marginal log-likelihood difference rather than a maximum log-likelihood difference ... Well, the AIC ends up with comparing -2 log(L) + 2d for the two hypotheses. The difference of these for models M1 and M0 is just (the negative of) 2 log(L1/L0) - 2(d1-d0).Or have I missed something here? So the expectation of the difference is log likelihood *is* described by the AIC, right? And isn't it (in view of Fisher's distribution) wrong too? That is what disturbs me and makes me feel there is something I don't understand about the AIC argument. Joe Joe Felsenstein j...@gs.washington.edu Department of Genome Sciences and Department of Biology, University of Washington, Box 355065, Seattle, WA 98195-5065 USA [[alternative HTML version deleted]] ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Re: [R-sig-phylo] LL ratio test
Carl Boettiger wrote: Others on the list can weigh in with more authority, but perhaps this will get the discussion started. Yes, it's important to know whether the parameters are nested, and the issue of being at the end of a parameter range is serious. Recall that AIC values are a fequentist statistic: and they obey the very same same distribution as the likelihood ratio, (recall it is a difference log likelihoods, just shifted by the difference in the number of parameters (e.g. -2 [ log L1 - log L0 - (k1 - k0)]). Recall that the maximum likelihood estimate (MLE) is a biased estimate of the likelihood of your data and that AIC penalty is simply creating an asymptotically unbiased** estimator of the true model likelihood, which is a frequentist concept to begin with. Why we report confidence intervals/p-values in the case of one of these statistics but not the other is not obvious to me either. I will confess my relative ignorance of AIC issues (my phylogeny book has a simple, elegant, and clear explanation -- which I wrote in a hurry while excited that I finally understood this, and which turns out to make no sense whatsoever and should be firmly ignored by all). But I do know this: If we have the likelihood ratio R = L(p')/L(p) where p' is the ML parameter values and p is the true parameter values, and where p is in the interior of the set of possible parameters, then RA Fisher showed about 1922 that asymptotically with large amounts of data: 2 log(R) is distributed as chi-square with D degrees of freedom, where D is the difference of the number of parameters being estimated in p' and the number of parameters being estimated in p. Now we know that the expectation of that chi-squared variable is D. So to correct the bias in R we should subtract D. That sounds like what Carl is explaining too. It sounds like a very simple and clear explanation of the AIC. Unfortunately that subtraction is *not* what AIC does. It subtracts 2D. The reason it does so is unclear to me. It involves some kind of prior on models, I think. As far as I am concerned it is like the peace of god, in that it passeth human understanding. Maybe the experts here can give me a simple explanation. Otherwise maybe we should honor Fisher (not me) and only subtract D, and call the result the FIC, But that works only for nested hypotheses, and the main point of the AIC is to deal with non-nested hypotheses. To make matters worse, in my field the AIC has the reputation of too easily favoring the most complex hypothesis, so maybe we should be subtracting more than 2D, not less. Clueless in Seattle. Joe Joe Felsenstein j...@gs.washington.edu Department of Genome Sciences and Department of Biology, University of Washington, Box 355065, Seattle, WA 98195-5065 USA [[alternative HTML version deleted]] ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Re: [R-sig-phylo] Using AIC/AICc/BIC to select trait a model of trait evolution
Matt Pennell wrote: My question is: how many observations do we have when we compare trait evolutionary models? People tend to use the number of tips of taxa for which we have trait values. However, this may not be technically accurate. First, of course, both the branch lengths and the tip values factor into the likelihood equations so it seems sensible that these are both somehow included as observations. Second, the trait values we observe are of course not independent (that is the whole reason we are using a phylogeny in the first place!!). It is unclear whether/how this fact should factor into our calculation of the n. I know that it phylogenetics, when people do model selection for the model of sequence evolution, they use the number of sites in the alignment though i am not sure there is a clear justification for this either. On just one of these points: number of sites in a molecular alignment is completely justified as the number to use in computing the number of data points, if the evolution at the sites is independent (i.e. independent, given the true phylogeny). In that case each column of the data matrix is drawn i.i.d. from a distribution. Of course, even correlations of rates of evolution among neighboring sites violates this. Whether and how number of species comes in is trickier. Joe Joe Felsenstein j...@gs.washington.edu Dept of Genome Sciences and Dept of Biology, Univ. of Washington, Box 5065, Seattle Wa 98195-5065 [[alternative HTML version deleted]] ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Re: [R-sig-phylo] Comparative Methods and Pseudo-Traits
Liam Revell wrote: Poaching Intensity = beta0 + beta1*Body Size + e I think it depends on how the residual error in the model is distributed (esp. correlated) among species. It seems possible to invent hypothetical scenarios (as I did in my previous email) about how the residual error in poaching intensity given body size could be phylogenetically autocorrelated, but this is fundamentally an empirical question. If the residual error of poaching intensity given body size is phylogenetically correlated and we ignore this then we risk overestimating the predictive value of our model. In addition, the residual error is likely/guaranteed to be non- Brownian if the response variable is binary (e.g., extant v. extinct). For these type of data the tree should not be ignored, but simple GLS regression is probably not appropriate. One option might be the phylogenetic logistic regression of Ives Garland (2009), but I'm not too familiar with this method. The real problem would come if the characters evolved to respond to the poaching intensity, and that evolution was inherited along the tree. Then our uncertainty about the poaching intensity in the past would be a big problem. But if poaching is only a present-day phenomenon, it would (presumably) respond to only today's characters, and they would not yet have responded to it, so there would be no problem. But I do think it is a Big Mistake (and apparently a frequent one), when people measure the residual of a character regressed on environmental variables, then casually assume that it is undergoing Brownian Motion, when the environmental variables may have been different in the past. That's probably not an issue here. Joe Joe Felsenstein j...@gs.washington.edu Dept of Genome Sciences and Dept of Biology, Univ. of Washington, Box 5065, Seattle Wa 98195-5065 [[alternative HTML version deleted]] ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Re: [R-sig-phylo] Comparative Methods and Pseudo-Traits
Liam wrote: I'm not sure I entirely agree that we need to assume that the environmental trait is evolving on the tree by Brownian motion. I believe that so long as Y|X (in David's example, growth rate given habitat degradation) is evolving by Brownian motion, we should be OK to use PIC regression. I worry. Why are we to assume that current phenotypes are distributed around optima that are based on *current* environmental variables? Joe --- Joe Felsenstein, j...@gs.washington.edu Department of Genome Sciences and Department of Biology University of Washington Box 355065 Seattle WA 98195-5065 ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Re: [R-sig-phylo] Is correlation of PICs, with tip data and each terminal node split into male and female, a valid method?
Anthony Ives wrote: Just a quick comment on your response to Dan. You can use REML to compare models, but only to compare the variance components and only if they have the same mean components (e.g., contain the same predictor variables). If there are differences in the mean components, you need to use ML, because (very loosely speaking) REML is based on estimating the variances conditional on the mean. I have come across examples where the results using ML and REML (accidentally) are quite different, with ML being correct. This is a good point. But in many cases a REML analysis drops only the grand mean. In effect it is ML, after passing the data through a filter which replaces it by the differences of each point from the grand mean for that character. If the grand mean is not of interest, REML should work as well as ML. If we are trying to test some hypothesis about the value of the grand mean, of course REML is inappropriate. An example would be simple one-way ANOVA which is actually a REML analysis, and it does test means of the groups. Joe Joe Felsenstein j...@gs.washington.edu Department of Genome Sciences and Department of Biology, University of Washington, Box 355065, Seattle, WA 98195-5065 USA [[alternative HTML version deleted]] ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Re: [R-sig-phylo] Show Informative Sites?
Leandro Jones -- Nick's rules won't allways work in a Parsimony context. For example, a position like this one: 1 A 2 A 3 T 4 C would be informative under rules (a) and (b), but it is in reality uninformative, as any of the possible trees have a length of 2. Thus, this character tells us nothing about _phylogeny_. I disagree. If the only way you can interpret anything is by parsimony, sure. But for statistical phylogenetics, it has information. It works against any phylogeny that has all its branch lengths short, for example. Eliminating that character, not telling the statistical method you did that, and then going ahead with the analysis is a Big Mistake. J.F. Joe Felsenstein j...@gs.washington.edu Department of Genome Sciences and Department of Biology, University of Washington, Box 355065, Seattle, WA 98195-5065 USA ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Re: [R-sig-phylo] Show Informative Sites?
Nick Matzke wrote: All sites are informative under likelihood, ... and under Bayesian methods, and under distance matrix methods too. If you leave out uninformative sites without compensating for doing that by an explicit statistical correction, you will have some very unpleasant surprises. J.F. Joe Felsenstein j...@gs.washington.edu Department of Genome Sciences and Department of Biology, University of Washington, Box 355065, Seattle, WA 98195-5065 USA [[alternative HTML version deleted]] ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Re: [R-sig-phylo] R: Re: R: ancestral state reconstruction for tips
Pasquale Raia said: Of course Ted is right, but my problem with this computation, or with the simple exercise I was proposing is well another: as a paleontologist I often come across pretty exceptional phenotypes (dwarf hippos and elephants, huge flightless birds, to make a few examples). When you use methods like this (I mean Garland and Ives') and compare the output with those phenotypes, as I did, you immediately realize what the the bottom line is: no matter if they are nodes or tips, by using the expected (under BM) covariance the estimated phenotypes are dull, perfectly reasonable but very different from anything exceptional you may find yourself to work with. This is why I feel it is difficult to rely on those (unobserved) values to begin with. I think that what is being said is that Brownian Motion is too sedate a process and does not predict some of the large changes actually seen in the fossil record. That's a legitimate point but does put the onus on the maker of the point to propose some other stochastic process that is tractable and has these large changes (and that fits with known Mendelian and Darwinian mechanisms). Just complaining that the Brownian stochastic process is no good is insufficient. If we want to add the fossils to the calculation, then they will of course pressure the Brownian Motion process to change more in their vicinity, which may help some. Joe Joe Felsenstein j...@gs.washington.edu Dept of Genome Sciences and Dept of Biology, Univ. of Washington, Box 5065, Seattle Wa 98195-5065 [[alternative HTML version deleted]] ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Re: [R-sig-phylo] R: Re: R: ancestral state reconstruction for tips
David Bapst wrote: I think Joe is right that realizing a model is an inaccurate or imprecise description of reality should impel us to develop better models of the world around us, because this partly how science moves forward. However, I don't think pointing out that a model is deficient requires that that person must themselves develop an alternative. After all, an alternative model that capture a more realistic level of complexity may not be possible in some situations (it is certainly possible in trait evolution models, however.) Requiring such a thing would put too much pressure on scientific whistle-blowers, who play a very important role in reminding the rest of us that the world is more than the models we use to understand it and make our predictions. As a theoretician, I am perhaps oversensitive to the folks who, after a lecture in which I present a simple model, triumphantly declare but you didn't include predator satiation. Then they walk away looking very pleased with themselves. There is a similar problem with the quibblers who inhabit grant review panels, and are always asking me to do much more complicated models that are impossibly hard (and they are not aware how hard they are). Just understand, when you raise legitimate concerns, that us model-analyzers are also used to getting a lot of these unreasonable demands too, and may be grumpy as a result. Joe Joe Felsenstein j...@gs.washington.edu Dept of Genome Sciences and Dept of Biology, Univ. of Washington, Box 5065, Seattle Wa 98195-5065 [[alternative HTML version deleted]] ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Re: [R-sig-phylo] R: Re: R: Re: R: ancestral state reconstruction for tips
Folks -- I was intending my most recent message to be apologetic -- that I was perhaps overreactive. Certainly Pas has not raised unreasonable objections or been obstructive with my grants! (Others have). Let me raise an issue so I understand him more clearly: Pas, are you saying that you see phenotypes in the fossils that seem incompatible with the Brownian Motion assumption? Joe Joe Felsenstein j...@gs.washington.edu Dept of Genome Sciences and Dept of Biology, Univ. of Washington, Box 5065, Seattle Wa 98195-5065 [[alternative HTML version deleted]] ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Re: [R-sig-phylo] R: Re: R: Re: R: Re: R: ancestral state reconstruction for tips
Pas said: What I', trying to do now is writing a R routine to back-calculate the expected branch lengths for the unusual critters, given the fitted ancestral values and tip values of the phenotypes, and assuming BM, in order to compare the actual branch lengths to the expected. The ratio of these lengths, if I'm not delusional and definitely lucky, is a per-lineage rate of phenotypic evolution. The bottom line is answering the question: how long should the branch leading to that particular species be if it evolved at the same rate of its sister species? Good way to approach it. If you can calculate the likelihood of trees, one way would be to not bother fitting any ancestral values: just try different lengths for the branch that connects the fossil to the tree, and see which one maximizes the likelihood. Joe Joe Felsenstein j...@gs.washington.edu Dept of Genome Sciences and Dept of Biology, Univ. of Washington, Box 5065, Seattle Wa 98195-5065 [[alternative HTML version deleted]] ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Re: [R-sig-phylo] comparative methods at subspecies level
Paolo Piras wrote: I write you in order to ask a phylosophical opinion in applying phylogenetic comparative methods to subspecies belonging to the same species for which a phylogenetic/phylogeographic scenario is known on the basis of molecular studies. I have a number (10) of population samples belonging to two species at continental scale. Some OTUs are surely NOT panmittic with the others while for some others this is not sure. Someone could argument against the use of comparative methods when a gene flow is still present among OTUs. However, I think that, if possible, AT LEAST the covariance due to populations relationships should be removed. You dont think that it is always better to take in account all possible relatedness betwen OTUs? In this paper D. C. BLACKBURN,G. J. MEASEY. 2009.Dispersal to or from an African biodiversity hotspot? Molecular Ecology; Volume 18, Issue 9, pages 19041915, May 2009. the authors apply common comparative methods to a phylogeography of populations belonging to ONE species. I found their strategy appropriate. I wrote to ask your opinion about this issue. Using standard tree-based methods to treat a single species which has populations that exchange migrants is inappropriate. In a 2002 paper I discussed how to do this properly using contrasts which are derived from the eigenvectors and eigenvalues of (an estimate of) the migration matrix between populations. Such migration matrices can now be inferred using programs LAMARC and Migrate. The paper is here: Felsenstein, J. 2002. Contrasts for a within-species comparative method. pp. 118-129 in Modern Developments in Theoretical Population Genetics, ed. M. Slatkin and M. Veuille. Oxford University Press, Oxford. Here http://evolution.gs.washington.edu/papers/spectrum/ is where you can get a PDF of a preprint version of it. The method is univariate at the moment but a multivariate version is under development -- the univariate version can be used to see whether a character has significant covariation with an environmental measurement. The method is also described in a recent paper by Stone, Nee and me: Stone, G. N., S. Nee, and J. Felsenstein. 2011. Controlling for non-independence in comparative analysis of patterns across populations within species. Philosophical Transactions of the Royal Society B 366 (1569): 1410-1424. which discusses the general issue. The (other) authors preferred, for greater comprehensibility, to describe my method slightly incorrectly (I got outvoted). Anyone considering this issue should consider these two papers carefully. Of course mixing within- and between-species analyses is more difficult yet. I hope to release an R package later this year to do the one-character analysis (it is not too hard to put one together yourself in the meantime). Joe Joe Felsenstein j...@gs.washington.edu Department of Genome Sciences and Department of Biology, University of Washington, Box 355065, Seattle, WA 98195-5065 USA [[alternative HTML version deleted]] ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Re: [R-sig-phylo] pic() vs gls()
Paolo Piras wrote -- Citing http://www.r-phylo.org/wiki/HowTo/Ancestral_State_Reconstruction: Using Felsenstein's (1985) phylogenetic independent contrasts (pic); This is also a Brownian-motion based estimator, but it only takes descendants of each node into account in reconstructing the state at that node. More basal nodes are ignored. I THINK that, on the opposite, more basal nodes are NOT ignored in gls and for this reason results can differ slightly I'm wrong? The contrast algorithm if continued to the root, makes the correct ancestral reconstruction for the root. You are correct that values for higher nodes in the tree are not the correct ML reconstruction for those nodes. If the tree is rerooted at any interior node and the algorithm used for that, then that node's reconstruction will be correct. There are ways of re-using information so that the total effort of doing this for all interior nodes will be no worse than about twice that of a single pass through the tree. However people may prefer to use PGLS, which if properly done should give the proper estimates for all nodes. There is some discussion of this in Rohlf's 2001 paper in Evolution. Joe Joe Felsenstein, j...@gs.washington.edu Dept. of Genome Sciences, Univ. of Washington Box 355065, Seattle, WA 98195-5065 USA ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Re: [R-sig-phylo] Simulating BM data on phylogeny
Ted wrote -- Based on zillions of BM simulations we have done with DOS PDSIMUL, you should never see a shift in mean value (versus starting value at root of tree), unless you are intentionally modeling a trend. Something must be wrong. in response to Dean Adams: For these simulations I generate a tree (in this case a perfectly-balanced tree) and simulate 100 data sets on the same phylogeny using a particular initial BM rate parameter (sigma). ... However, the most curious finding is that for all methods, as sigma increases, so too does the mean trait value across the tips (and the converse occurs as sigma decreases). This observation is curious to me, as one should not see a predictable shift in the mean under Brownian motion. Is it possible that the simulation is doing Brownian Motion on some other scale, such as a log scale? If one does BM on the logs and then looks at the original phenotype scale, you *would* expect that as the variance among species increases, so does the mean of the species means. Joe Joe Felsenstein j...@gs.washington.edu Dept of Genome Sciences and Dept of Biology, Univ. of Washington, Box 5065, Seattle Wa 98195-5065 [[alternative HTML version deleted]] ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Re: [R-sig-phylo] Bootstrap values and NJ when there is no genetic distance between samples
Emmanuel wrote: Is it a problem with ties or with identical sequences? I guess you can solve the latter easily (eg, using the haplotype function in pegas), and this will solve the vast majority of ties. Other cases of ties will certainly not result in such high bootstrap values (that's my intuition). My intuition disagrees with this. If several sequences are nearly identical, but each differs from their consensus at K sites, them if the tree-making algorithm does not randomise addition order of species (or otherwise somehow randomize the resolution of ties), there are likely to be artificially high bootstrap values even with nonzero branch lenghs. 1. Dropping identical sequences is a good thing to do, especially with a distance-based method. One issue is that after bootstrap sampling, some sequences that were not identical may become identical. So the tree-making method ought to be able to handle identical sequences. 2. A high bootstrap support (or another form of support) associated with a zero-length branch is an indication that something's wrong there. Again, it can be a problem even when the branch lengths are nonzero. Joe Joe Felsenstein, j...@gs.washington.edu Dept. of Genome Sciences, Univ. of Washington Box 355065, Seattle, WA 98195-5065 USA ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Re: [R-sig-phylo] Felsentsein's contrast LRT
Scott Chamberlain wrote: In Felsenstein's 2008 AmNat paper he states Likelihood ratio rest (LRT) are available of the hypotheis that a set of q characters have on phylogenetic covariation with the remaining p - q characters. And his software contrast gives only one LRT result in the output even if you have say 3 traits. If you are interested in contrasts and a LRT for all pairwise relationships of the 3 traits, do you just run analyses 3 different times, each with the pair of traits you are interested in? You can do all three tests, but they are not independent tests. In Contrast I allow the user to specify two sets of characters and test whether they have evolved independently. If you want to do all three possible tests: 1 versus (2,3), 2 versus (1,3), and 3 versus (1,2) keep in mind that those tests are not independent tests (for example, a strong correlation between 2 and 3 could cause two of the tests to show nonindependence. You would need to run Contrast three times to do those tests. Perhaps what is needed is a test that finds the two sets to divide the characters into, sets that are as much independent as possible. I have no immediate ideas how. J.F. Joe Felsenstein, j...@gs.washington.edu Dept. of Genome Sciences, Univ. of Washington Box 355065, Seattle, WA 98195-5065 USA ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Re: [R-sig-phylo] phylogenetically-informed Reduced Major Axis regression in R?
Liam said: Just calculating the slope is straightforward. For tree and column vectors x y (in order tree$tip.label): The relevant point to keep in mind is that once you have made maximum likelihood estimates of the means, variances and covariances of the variables, the Reduced Major Axis is simply a function of these, and its ML estimate is that function of the ML estimates of the covariances. You don't need to do any separate ML estimation for the RMA. If you want to test hypotheses about the RMA, if you can recast them as hypotheses about the slopes and correlations (say that the slope is zero) then the test can be done there, and no separate test of the RMA is needed. In the next release of my program Contrast in PHYLIP, I will have an option to print out the RMA and its other axes, which did not involve anything more complicated than computing them from the covariances that it was already estimating. Joe Joe Felsenstein, j...@gs.washington.edu Dept. of Genome Sciences, Univ. of Washington Box 355065, Seattle, WA 98195-5065 USA ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Re: [R-sig-phylo] Simulate node ages given known topology
Gene Hunt said: Brian Sidlauskas did this in an Evolution paper in 2008 (based on Yang and Rannala 1997): Sidlauskas, Brian L. 2008. Continuous and arrested morphological diversification in sister clades of characiform fishes: a phylomorphospace approach. Evolution 62(12): 3135-3156. He had a morphological tree without fossil or molecular branch lengths. The branch lengths were based on rates of speciation, extinction and sampling, and so it would be a different approach than the coalescent theory, I believe. I think Brian has R code for it; you may want to contact him directly. In my 2004 book, on pages 570-571 I note the use of a time transformation that makes simulation of birth-death processes easy. This is based on Bruce Rannala's 1997 work. If you want to start out with the first fork in the tree at time T ago, you want to take your N tip species and choose uniformly from 2, 3, ..., N-1 to choose the number of those tips that are descended from the left branch of the fork. Say this is called K. Then you need to simulate a birth-death process for K tips (the subtree descended from the left fork), and another for N-K tips for the right fork. The result will be a fairly quick algorithm for simulating a BD process that has its oldest fork T units of time ago and leads to exactly N tips. If instead you just want the process to start T units of time ago, and have the oldest fork occur some time after that (determined by the BD process), then that is even easier using the time- transformation trick. I do not know the R packages well enough to know whether one of them implements this, but I am sure someone will comment on that. Joe Joe Felsenstein, j...@gs.washington.edu Dept. of Genome Sciences, Univ. of Washington Box 355065, Seattle, WA 98195-5065 USA ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
[R-sig-phylo] Fwd: Simulate node ages given known topology
Andy Rominger wrote: Thanks very much Tanja and Joe, I think these suggestions for sorting will be very useful; we were trying to do something similar (but very hacker-ish) and with this theoretical basis I'm sure we'll pull it off. And for BD, the time transformation should be perfect. It is a very useful tool. If people cite it, they should check Bruce Rannala's paper to see if it is there too. Some closely-related math certainly is. One thing that computing the transformation for various values of B and D (birth and death rates, called lambda and mu in the derivation in my book) is that it gives you a direct idea of how many groups born at each time are missing from the tree owing to having gone extinct before reaching the present day. J.F. Joe Felsenstein, j...@gs.washington.edu Dept. of Genome Sciences, Univ. of Washington Box 355065, Seattle, WA 98195-5065 USA ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Re: [R-sig-phylo] Multistate Trait Polymorphism
Luke Harmon said -- Yes Emmanuel is correct, fitDiscrete does not deal with polymorphic data. I have a fix that I made for a specific project that I'm sending to Charles, if anyone else is interested email me off-list. It's very clunky. I suspect this is not just a technical programming issue or a matter of standardizing formats of files, but depends on what you want to assume about the mode of evolution of a polymporphic character. Not a trivial matter at all, and not one where you just want to accept any old arbitrary rule. For example there is a very old (1967) parsimony method called polymorphism parsimony but it makes specific assumptions -- namely that polymorphism is hard to retain along a lineage, easy to lose but hard to regain. So do you want assume that, or what? Joe Joe Felsenstein j...@gs.washington.edu Department of Genome Sciences and Department of Biology, University of Washington, Box 355065, Seattle, WA 98195-5065 USA ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Re: [R-sig-phylo] Branch and Bound Maximum Parsimony (Matthew Vavrek)
Regarding finding all most parsimonious trees by branch-and-bound: Program Penny in my PHYLIP package could be called from within R, driven by batch scripts. You'd have to make them yourself, but it's not hard. However Penny can handle only 0/1 characters, and if there is a multifurcation it will return all the binary trees compatible with that. It is also much slower than PAUP*. It would get you a file of Newick trees. Joe Joe Felsenstein j...@gs.washington.edu Department of Genome Sciences and Department of Biology, University of Washington, Box 355065, Seattle, WA 98195-5065 USA ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Re: [R-sig-phylo] How to detect phylogenetic signal (lambda) in one unscaled trait?
Ted wrote: Following on that, various papers (I can't remember the references) have argued that imagining Brownian-like evolution of body size on a log scale seems reasonable. That is, it should be equally easy for an elephant's body size to evolve 10% as for a mouse's body size to evolve 10%, and to analyze that you want everybody on a log scale. Extending this, you would want to use log(Y/X) or log(Y/[X raised to some allometric slope]). It's just easier to put all variables onto their log scales, so you have log(X), log(Y), log(Z) and then the allometric stuff just corresponds to linear combinations there, which you already have machinery to do. The recommendation to use log scales is a very old one: I talk about it in my Theoretical Evolutionary Genetics free e-text. But is older than that. Falconer has a whole chapter on Scale in his 1960 Introduction of Quantitative Genetics. Sewall Wright has a discussion of it in Chapter 10 of his 1968 first volume of Evolution and the Genetics of Populations (see pages 227ff.). But it is older than those -- for Wright also says (p. 228): Galton, as long ago as 1879, noted that the logarithms of measurements of organisms may be more appropriate than the measurements themselves on the hypothesis that growth factors tend to contribute constant percentage increments rather than constant absolute ones. The old biometrical types of the 1930s and 1940s knew all about this (though taking logarithms was tedious). It is only the more recent researchers who don't know it. Joe Joe Felsenstein j...@gs.washington.edu Department of Genome Sciences and Department of Biology, University of Washington, Box 355065, Seattle, WA 98195-5065 USA ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Re: [R-sig-phylo] bootstrapping with boot.phylo()
Klaus Schliep wrote -- I suspect that some elements in this distance matrix are close to 0.75 than pairwise distances for the bootstrap samples are likely to be equal or greater than 0.75. Most models (K80, JC69 etc.) are not defined for distances =0.75 and will return Inf or NaN (the 0.75 can vary a bit, depending on the substitution model). bionj of course does not like building trees from infinite values as input. With short sequences the variances are of course larger and you are more likely to observe this, that's why your larger data set works fine. However in this cases NaN or Inf are the correct results! I often have to deal with users of my PHYLIP package who are upset at this happening with large distances when the sequences are bootstrapped. (In my package the distance is set to -1.0 in that case, and it should not be used to make a tree). NaN is not the correct value, but Inf is -- the correct distance for (say) the Jukes-Cantor model or the Kimura 2-parameter model when the sequences differ by more than 75% is (positive) infinity, since these are inferred to be unrelated sequences. It would be nice to catch this error with a try and use only trees from finite distance matrices or set infinite values to a large value. But one should return a warning as these samples are likely to be biased. Exactly. J.F. Joe Felsenstein j...@gs.washington.edu Department of Genome Sciences and Department of Biology, University of Washington, Box 355065, Seattle, WA 98195-5065 USA ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Re: [R-sig-phylo] Model-Selection vs. Finding Models that Fit Well
Ted said -- One point for clarification and your further thoughts. The way parameterize the OU process in Lavin et al. (2008) it is a value of zero (not infinity) that gives a start phylogeny with contemporaneous tips. Sometimes the ML estimate of d (what we call it) goes to zero, but more often it may go very small but not zero. In either case, it seems to me you can do a LRT versus a star with one d.f. Can you? Is the value of zero contained within the range of possible values, or at its extreme? I think for LRT you need contained within. Joe Joe Felsenstein j...@gs.washington.edu Department of Genome Sciences and Department of Biology, University of Washington, Box 355065, Seattle, WA 98195-5065 USA ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Re: [R-sig-phylo] trees to matrix and gamma-statistic
Folks -- I believe that the distances predicted from the tree are called patristic distance. They are a quite old concept. I don't have the literature accessible right now but believe they were first described by JS Farris or by Jim Rohlf. Searching on Google using that phrase yields more tools. Cophenetic correlation is the correlation coefficient (the ordinary Pearson correlation) between the observed distances and the patristic distances. Farris pointed out in the early 1970s that minimizing it chooses the same tree as minimizing the unweighted least squares fit of the distances. Joe --- Joe Felsenstein, j...@gs.washington.edu Department of Genome Sciences and Department of Biology University of Washington Box 355065 Seattle WA 98195-5065 ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Re: [R-sig-phylo] trees to matrix and gamma-statistic
Emmanuel wrote -- 2. I understand that the gamma-statistic index of Pybus and Harvey applies only to ultrametric trees, there is a similar index for trees that are not ultrametric? The 'ultrametricity' of the tree gives it a temporal feature (if done properly, of course), so that methods considering the tempo of evolution can be used to analyse such trees. If you cannot get ultrametric dated trees, see the package apTreeshape which has methods to analyse tree shapes, but the range of inferences will usually be less than with dated trees. To rephrase what he said (but probably not change the meaning), if the intent of the gamma statistic is to measure the change through time of the speciation rate (or the difference between the speciation and extinction rates) then you need some way to associate a time with each point in the tree. Lacking an ultrametric tree, you don't have a non-arbitrary way to do that. Joe Joe Felsenstein, j...@gs.washington.edu Dept. of Genome Sciences, Univ. of Washington Box 355065, Seattle, WA 98195-5065 USA ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Re: [R-sig-phylo] trees to matrix and gamma-statistic
Folks -- Now that I have had a chance to look up the history: 1. Sokal and Rohlf in 1962 (in Taxon) introduced the cophenetic correlation. 2. They had a distance called the dendrogrammatic distance which was (for the ultrametric trees they considered), half of the patristic distance. 3. Wartren Wagner mentions patristic distance in a 1968 review in Bioscience, but does not claim it. 4. JS Farris's paper showing that the cophenetic correlation is minimized when we achieve a least squares fit of tree to distances is in 1969 in Systematic Zoology. Joe Joe Felsenstein, j...@gs.washington.edu Dept. of Genome Sciences, Univ. of Washington Box 355065, Seattle, WA 98195-5065 USA ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Re: [R-sig-phylo] PIC vs. PGLS
Simon Blomberg wrote on 27/09/2010 08:10: (Ted Garland had written:) Another interesting technical point. In general, PIC and PGLS are the same, especially if you stick with a simple Brownian motion model of character evolution. However, their complete mathematical identity has not, to my knowledge, been proven. (Simon Blomberg:) I have a proof. I have a paper submitted to Sys. Biol. on the topic. You can find in my book the R code to get exactly the same coefficients with PICs and PGLS. This works as long as the tree is ultrametric (for equal variance). It's not a formal proof, of course, but a strong suspicion. I can't speak on behalf of PGLS, but if one uses contrasts, the ML analysis is of course using an i.i.d. model with zero expectations of the contrasts. It is provable that this will be identical (getting the same P values and estimates) to a REML (reduced or restricted maximum likelihood) on the full covarying Brownian motion model. If PGLS gets identical results to that, then that proves the identity to contrasts analysis. Joe Joe Felsenstein j...@gs.washington.edu Department of Genome Sciences and Department of Biology, University of Washington, Box 355065, Seattle, WA 98195-5065 USA ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Re: [R-sig-phylo] retrieving simulated ancestral states from sim.char{geiger}
Luke wrote: The simulations in Geiger do draw from a mvn distribution but - in an unnecessary step - draw random numbers for every branch and use matrix multiplication to get the tip values. This is kind of dumb and needlessly slow but does give you the right answer. Carl is correct that simply drawing from a mvn distribution is faster and equivalent. If I had time I'd rewrite the function - maybe I will! Without knowing exactly what your R code is doing, I just want to add to these (correct) ideas that there is one simple way of getting tip and interior node values without much more effort. Work up the tree, drawing a set of normals (one for each character) to get the net changes in that branch. Keep going until you reach the tips. (If the characters are not independent, you also need to have on hand a matrix which contains the square root of the covariances among characters, and multiply the values at each node by that.) This way you don't need a nodes x nodes covariance matrix. Joe Joe Felsenstein j...@gs.washington.edu Department of Genome Sciences and Department of Biology, University of Washington, Box 355065, Seattle, WA 98195-5065 USA ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Re: [R-sig-phylo] Simulate continuous character evolution on phylogenetic trees using different models of evolution
Emmanuel said -- We discussed on the list a few months ago about using any correlation matrix to simulate characters. At the moment, it's tedious to get the correlation matrix from a corStruct object in ape, but I'll improve that soon (following several requests) so that it'll be straightforward to simulate (continous) traits under the available models of evolution. Keep in mind that one can't always assume that the variances (say of Brownian motion) are equal in all characters, so the correlations aren't quite enough. Joe Joe Felsenstein, j...@gs.washington.edu Dept. of Genome Sciences, Univ. of Washington Box 355065, Seattle, WA 98195-5065 USA ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Re: [R-sig-phylo] is maximum likeyhood a phylogenetic approach?
Leandro Jones replied to Stephane Bouee: One could argue that your characters are not defined using a phylogenetic criterion; that is, when performing a phylogenetic analysis (either by ML or Parsimony), you have to assume that your dataset is made of homology hypotheses. That is, one could argue that your data are based on similarity relationships rather than on hypotheses of historical transformations. Having this in mind, it is questionable whether, using these data, any method could produce an acceptable estimate of the phylogeny of your group of interest. If one simply measures the same character in different species, is that not good enough? My curiousity is because I suspect that if Stephane had coded his characters 0/1 before trying to analyze them, they would then be accepted as made of homology hypotheses. J.F. Joe Felsenstein j...@gs.washington.edu Department of Genome Sciences and Department of Biology, University of Washington, Box 355065, Seattle, WA 98195-5065 USA ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Re: [R-sig-phylo] is maximum likeyhood a phylogenetic approach?
When I wrote: As what classifications should be, or whether methods should be considered as making phylogenetic or phenetic classifications, I have my own position, that no one else seems to back (in public, anyway). I think that we should not think of these trees as classifications, and not call them phylogenetic classifications or phenetic classifications, but consider them as estimates of the phylogeny. The issue of how to classify is less important anyway. Emmanuel Paradis responded - I have the strong feeling that most users of R and its [phylo]genetics packages are interested in the study of evolutionary processes, not in classification (I rarely see questions about classification or systematics here). So maybe most of us silently back Joe's position. About the issue of how to classify, I think it is very important. The point here is, in my view, that the confusion between classification and evolution greatly hampered the progress of evolutionary biology, but the situation has improved in recent years. I can't speak for most users of R, but I do suspect that Emmanuel is right in that there is agreement with this position among many younger evolutionary biologists. But it is a sufficiently intimidating atmosphere for them that they do not usually say that out loud. I have stuck my neck out, mostly for the fun of it. The reactions among many systematists have been strong -- they are really outraged, and figure that this is just some arbitrary opinion of mine, which they are (barely) willing to tolerate. I suppose the matter will become one of open discussion some day. Anyway, back to R. J.F. Joe Felsenstein j...@gs.washington.edu Department of Genome Sciences and Department of Biology, University of Washington, Box 355065, Seattle, WA 98195-5065 USA ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Re: [R-sig-phylo] is maximum likeyhood a phylogenetic approach?
Elsa et Stéphane Bouee wrote: I am currently doing a classification analysis on morphologic data using modern methods of morphometry (procruste analysis). The variables obtained are quantitative and I would like to use them in a phylogenetic approach. For this purpose I used 2 methods: 1) maximum likelihood with the software phylips and its contml add on; As the author of Contml, which is a program, not an add- on (whatever that means) I can perhaps comment. 2) cladistic analysis on quantitative variables, with the software TNT developed by Goloboff, Farris and Nixon. I am wondering if the max likelihood is really performing a phylogenetic classification or if it is rather a kind of phenetic method. Maybe the answer may vary according to whom I ask the question …? I have submitted an article presenting the max likelihood as a phylogenetic classification but the reviewers are challenging this assertion. Good luck with the reviewers -- I have never been able to influence people who had strong views about parsimony being better than everything else. There's nothing wrong with the Procrustes superposition. But the assumptions of Contml are that each coordinate is independently changing according to a Brownian motion process, and all at the same rate. This is a dubious assumption at best. I have written about this and warned about this for years (Amer J Human Genetics 1973; Evolution 1981; Annual Review of Ecology and Systematics 1988; article in volume on Morphology, Shape, and Phylogenetics 2002). I have an extensive discussion of the issue in my 2004 book, chapter 24. In PHYLIP the documentation file contchars.html discusses it too. Ideally it would be best to have some estimate of the covariation between the characters, and use that to transform your coordinates to independence. I don't know an easy way to do that unless you have a known phylogeny and can estimate the covariances along it using comparative methods software. So the referees are right to be skeptical, but they should be equally skeptical of discrete characters parsimony approaches which make quite similar assumptions. The hardest-core cladistics people do not admit this. As what classifications should be, or whether methods should be considered as making phylogenetic or phenetic classifications, I have my own position, that no one else seems to back (in public, anyway). I think that we should not think of these trees as classifications, and not call them phylogenetic classifications or phenetic classifications, but consider them as estimates of the phylogeny. The issue of how to classify is less important anyway. J.F. Joe Felsenstein, j...@gs.washington.edu Dept. of Genome Sciences, Univ. of Washington Box 355065, Seattle, WA 98195-5065 USA ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Re: [R-sig-phylo] PICs: Why force linear regressions through the origin?
Gernot Huber asked -- 1. Why do you have to force a linear regression through the origin when using phylogenetically independent contrasts (PICs)? Because their expectations are zero. A contrast could be either X1 - X2 or X2 - X1, for one thing. The expectations of X1 and X2 are identical so their difference (in either direction) has expectation zero. 2. If you suspect an inverse linear relationship, would you transform one axis so you could still force the regression through the origin? If so, what's the best transformation? The machinery assumes a bivariate normal distribution. So if, say, Y = a(1/X) + b, wouldn't that mean you were assuming that Y and 1/X were distributed as bivariate normal? If so, the distribution of X looks wierd when (1/X) gets near zero. But at least in that case If so, you could do that transformation (1/X) at the beginning. Or are you assuming that Y and X are bivariate normal and then Y = a(1/X) + b? That is dicier since X could in principle get down to zero and then there is a loud explosion on the Y scale. Assuming (ln X, ln Y) is bivariate normal seems safer, especially if the characters have natural limits at zero. Then the regression might just be a simple linear one. Joe Joe Felsenstein j...@gs.washington.edu Department of Genome Sciences and Department of Biology, University of Washington, Box 355065, Seattle, WA 98195-5065 USA ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Re: [R-sig-phylo] Parametric bootstraping
Fernando Fernandez wrote: first time I really have a question I can't solve Departing from the hypothesis that if anything is doable it can be done with R How can I do Parametric bootstrapping on a tree? That is I want to assume a phylogenetic tree as a good model, and then then shuffle leaves (with haplotypes of a different gene for example) to compare the randomised distribution to the a priori tree. (I cannot compare topographies of two trees because I have no good support for one of them, and I still presume this alternative is a good approach) Some people answered questioning this as a good resampling procedure. Just to be the Voice Of Orthodoxy, let me add ... Whatever its merits, it isn't the same as parametric bootstrapping. That involves simulating data along a tree, not shuffling data among leaves. J.F. Joe Felsenstein j...@gs.washington.edu Department of Genome Sciences and Department of Biology, University of Washington, Box 355065, Seattle, WA 98195-5065 USA ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo