[R-sig-phylo] Workshop: DiversificationRates & Macroevolution
We will be offering a workshop on phylogenetic approaches for studying diversification rates to coincide with the North American Evolution meetings in June. Please distribute! == Short course on the analysis of diversification rates from phylogenies: June 22-23 on the campus of Oregon State University, to coincide with the North American Evolution meetings (SSE/ASN/SSB) in Portland (June 23 - June 27). The workshop is funded in part by the National Science Foundation with additional support from Oregon State University and is co-organized by Dr. Dan Rabosky (University of Michigan) and Dr. Brian Sidlauskas (Oregon State University). Travel awards of up to $500 per person are available to cover participation costs. Overview: Rates of speciation, extinction, and phenotypic evolution vary widely across the Tree of Life and through time. This workshop will provide theoretical background and a hands-on practicum in the analysis of lineage diversification rates using time-calibrated phylogenetic trees. Topics will include: — Working with phylogenies in R — Theoretical foundations of diversification models — Developing your intuition for diversification models — Using BAMM to study complex patterns of diversification rate variation on phylogenies — Testing hypotheses about trait-dependent diversification — Assessing the reliability of inferences with BAMM and other methods — Visualizing macroevolutionary dynamics on phylogenies — Studying diversification rates on phylogenies that include fossils Course will primarily be taught by Dan Rabosky (University of Michigan) with contributions from several co-instructors. The course will assume basic proficiency with the R programming/statistical environment and some familiarity with command line interfaces. Example datasets will be provided, but participants are encouraged to bring any phylogenetic dataset they wish to analyze (time calibrated phylogenetic trees). A personal laptop is essential. Workshop participants will arrive in Corvallis (Oregon) any time on Wednesday June 21 and we will depart for Portland on the evening of Friday June 23, such that individuals can attend the Evolution meeting. Shuttles offer easy transport between the Portland Airport and OSU’s campus every two hours, and housing is available at several hotels near campus. Details regarding accommodation and transport will be provided to successful applicants. To apply, please send a CV and a short statement (1-2 paragraphs) detailing your research interests, why you are interested in the course, and your prior experience with R and phylogenetics / comparative methods. Please email your application (or questions) to Dan Rabosky (macroevolution.works...@gmail.com). Applications will be accepted until April 2, 2016, but please apply early as spaces are limited. Target audience is graduate students and postdocs but applications from researchers at other career stages are welcome. Preference will be given to students with a clear interest and research focus in phylogenetics & macroevolution. _____ Dan Rabosky Assistant Professor & Curator of Herpetology Museum of Zoology & Department of Ecology and Evolutionary Biology University of Michigan Ann Arbor, MI 48109-1079 USA drabo...@umich.edu http://www.raboskylab.org http://www.lsa.umich.edu/ummz/ [[alternative HTML version deleted]] ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/
[R-sig-phylo] BAMM / BAMMtools reliability
BAMM and BAMMtools users: We have addressed a recently-published critique of BAMM / BAMMtools on the project website. http://bamm-project.org/replication.html Key results purporting to demonstrate poor statistical performance of BAMM on empirical datasets cannot be replicated under recommended settings for BAMM v2.5+. Most importantly, we strongly recommend that users do not modify hidden and undocumented settings for BAMM, nearly all of which are intended exclusively for BAMM developers. Modification of such settings may negatively impact the performance of the program on empirical datasets. ~Dan Rabosky _ Dan Rabosky Assistant Professor & Curator of Herpetology Museum of Zoology & Department of Ecology and Evolutionary Biology University of Michigan Ann Arbor, MI 48109-1079 USA drabo...@umich.edu http://www.raboskylab.org http://www.lsa.umich.edu/ummz/ [[alternative HTML version deleted]] ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/
Re: [R-sig-phylo] getBtimes vs. branching.times
Dear Lev- I don't think branching.times can compute these for non-ultrametric trees. You can do this with package BAMMtools, but you need a "hidden" internal function. You can access it as "BAMMtools:::NU.branching.times" It returns branching times relative to the most recently-occurring tip in the tree. It's a R-based recursion that is a little slower that the ape function, so it's not recommended as a replacement for branching.times if you have an ultrametric tree. I'm not actively maintaining laser, but getBtimes returns the output of branching.times after sorting the times and stripping out the node names (this was useful for something many years ago!). If you plot sort(getBtimes(x)) and sort(branching.times(x)) they should be identical. ~Dan Rabosky On Jan 12, 2016, at 7:37 PM, Yampolsky, Lev <yampo...@mail.etsu.edu> wrote: > Dear Colleagues, > > Does anyone know what is the difference between ape�s branching.times() and > laser�s getBtimes()? > And why they may be giving rather different results, particularly for > internal branches? (From an ultrametric tree created by > chronotree <- chronos(tree, lambda = 1, model = "correlated", quiet = FALSE, > calibration = makeChronosCalib(tree), control = chronos.control()) > > Thank you very much in advance for your help! > > PS. A related but less important question: I am curious how does > branching.times() calculate branching times from a non-ultrametric tree? > > -- > Lev Yampolsky > > Professor > Department of Biological Sciences > East Tennessee State University > Box 70703 > Johnson City TN 37614-1710 > Cell 423-676-7489 > Office/lab 423-439-4359 > Fax423-439-5958 > > ___ > R-sig-phylo mailing list - R-sig-phylo@r-project.org > https://stat.ethz.ch/mailman/listinfo/r-sig-phylo > Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/ _ Dan Rabosky Assistant Professor & Curator of Herpetology Museum of Zoology & Department of Ecology and Evolutionary Biology University of Michigan Ann Arbor, MI 48109-1079 USA drabo...@umich.edu http://www-personal.umich.edu/~drabosky http://www.lsa.umich.edu/ummz/ [[alternative HTML version deleted]] ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/
[R-sig-phylo] BAMM and BAMMtools package: updated, overhauled
R-sig-phylo community: We announce the release of major updates to BAMM and BAMMtools, a Bayesian framework for the analysis of speciation, extinction, and trait evolution on phylogenetic trees (www.bamm-project.org). If you are currently using BAMM/BAMMtools, we strongly recommend switching to BAMMtools v 2.1 (now on CRAN) and BAMM v 2.5 (or higher), now on the project website. There are significant changes to the software, including several important bug fixes. There are also some changes to the use of the software (e.g., BAMMtools now handles calculation of the prior, so there is no need to simulate it with BAMM). We have also added a comprehensive discussion of the likelihood function in BAMM and its assumptions at: http://bamm-project.org/likelihoodmodel.html As we discuss, there are several important assumptions and unresolved issues involving the calculation of likelihoods for "rate shift" models. These issues are common to all methods that purport to compute the likelihood of phylogenetic trees with rate shifts and non-zero extinction rates and are not limited to BAMM. Several issues lack a clear theoretical resolution and we've flagged them prominently to hopefully inspire more attention from the community. The new release of BAMMtools includes an R-based likelihood calculator that computes the likelihood of diversification shifts on phylogenetic trees exactly as implemented in BAMM to enable users to compare BAMM likelihoods to those from other software implementations. We provide examples on the website showing how this function can be used to test whether the BAMM likelihood function is implemented correctly (http://bamm-project.org/likelihoodmodel.html#testlikelihood). If you find any areas of parameter space where this function is poorly behaved, please let us know. We hope that this tool and associated documentation increases transparency of the likelihood calculations themselves as implemented in BAMM. ~Dan Rabosky _____ Dan Rabosky Assistant Professor & Curator of Herpetology Museum of Zoology & Department of Ecology and Evolutionary Biology University of Michigan Ann Arbor, MI 48109-1079 USA drabo...@umich.edu http://www-personal.umich.edu/~drabosky http://www.lsa.umich.edu/ummz/ [[alternative HTML version deleted]] ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/
Re: [R-sig-phylo] seemingly conflicting output from BAMM
Hi Chris- Just to add to what Jonathan wrote. This is a good question. You have two basic issues that are being confounded: (1) how much evidence is there for a rate shift overall, versus (2) how much evidence do you have bearing on the locations of specific shifts. In your case, you have limited evidence for rate variation: Bayes factors of 3 -5 versus a model with 0 shifts. That's rather weak evidence for rate variation, but it's (in my opinion) at least worth considering further. You can also see that this is not especially strong evidence from considering the posterior distribution of shifts, which gives a posterior probability of 0.18 to a model with 0 shifts (as an aside, Bayes factors are always going to be more reliable for these types of comparisons because they explicitly take into consideration whatever prior distribution you've specified on the number of shifts). In your case, you have weak overall evidence for a shift somewhere in your data. However, you have even less evidence that a shift occurs at any particular location. This is what you see in your credible shift set: each shift configuration has a shift in a different location. The overall best shift configuration, e.g., the one with the maximum a posteriori (MAP) probability, does not have any core shifts. This is a bit confusing, but is explained at length here: http://bamm-project.org/rateshifts.html Basically, we only consider significant shifts when we enumerate the set of distinct shift configurations in the posterior, where significant means that a shift occurs on a given branch with substantially elevated frequency relative to what you'd expect under the prior (again, this is all explained on the documentation page). These are termed core shifts in BAMM terminology. In your case, the shift configuration with the highest posterior probability actually has no core shifts. However, you can decrease the threshold used to identify core shifts. You can lower the Bayes factor criterion used to identify shifts with elevated posterior probabilities relative to the prior expectation in the credibleShiftSet function by changing the default value for the BFcriterion argument (e.g., if you set BFcriterion = 1, you will probably see a shift in your MAP probability shift configuration). However, tweaking the BFcriterion argument won't change the fact that your dataset has only weak evidence overall for rate heterogeneity among clades. As you decrease the BFcriterion, you will find that the posterior probability of your MAP shift configuration will also drop. Jonathan makes a good point, especially relevant in this case, because the credible shift set here is telling you something important that you won't get out of a single point estimate: you don't have much confidence at all in any particular rate shift. ~Dan Rabosky On Aug 14, 2014, at 8:28 PM, Jonathan Chang wrote: Hi Krzysztof, It certainly can be true that the most credible shift configuration is one where there are no inferred rate shifts, but also prefer a model with one rate shift. The issue is mentioned in the BAMM documentation http://bamm-project.org/rateshifts.html BAMM looks like it found evidence of a rate shift on your phylogeny. However, the exact location of that rate shift is not certain. In your plots, shift configuration #2 shows a rate increase on the upper clade, whereas #3 shows a rate decrease in the lower clade. #4 and #5 tell similar stories. Note that the configurations with 1 rate shift (#2-#5) combined are seen more often than configurations with 0 rate shifts (#1). Personally I'm unclear on how useful the most credible shift configuration actually is. To me that throws away a lot of the power of BAMM by reducing its inference down to a point estimate. Jonathan On Thu, Aug 14, 2014 at 5:08 PM, Krzysztof Kozak kk...@cam.ac.uk wrote: Dear All, I have been asked to analyse my chronogram using BAMM, and I like the idea. Sadly, I am puzzled by the output. I worked through the example and read the entire documentation, but still don't grasp why different analyses suggest different answers. 1. On one hand, several functions suggest that there are 1-2 rate shifts in my data. - Plotting netdiv rate shows it changing somewhat at two times. - plot.bammdata(edata) shows increased rate on the branch leading to a disproportionately large clade - rescaling the branch lengths by the Bayes Factor of a rate shift (bayesFactorBranches) also shows that branches leading to more speciose clades are very long - computeBayesFactors gives this output: 0 1.000 0.2860509 0.2273844 0.3127353 0.3841264 1.091439 0.3605840 1 3.4958818 1.000 0.7949089 1.0932856 1.3428605 3.815542 1.2605592 2 4.3978396 1.2580058 1.000 1.3753596 1.6893262 4.799974 1.5857908 3 3.1975924 0.9146741 0.7270825 1.000 1.2282796 3.489977 1.1530008 4 2.6033098 0.7446790 0.5919520 0.8141469 1.000 2.841354
Re: [R-sig-phylo] autocorrelation nodes in phylogeny
Hi Renske- This is a complex question and it would take a manuscript to answer it properly. The short answer is no, not without doing a lot of work to show that whatever error model you use leads to acceptable Type I error rates under various models of trait evolution. It is almost certainly that case that any off the shelf model will have elevated Type I error rates when applied to BAMM rate estimates. We will soon release a method that does exactly what you would like to do, so stay tuned on this topic. Whether you have power will depend on the number of distinct macroevolutionary rate regimes in your data. If only several, you won't have much power - you can think about the number of rate regimes in your data as (very) loosely equivalent to the effective number of data points you have to test hypotheses about traits and diversification in this framework. If you have evidence for 2 - 5 rate shifts, you should not expect to have much power to detect trait-dependent diversification. ~Dan Rabosky On Jun 9, 2014, at 9:29 AM, Renske Onstein wrote: Dear all, I would like to test the effect of a set of (binary+continuous) traits simultaneously on diversification rates using linear models. I have diversification rates linked to each node and tip in a phylogeny (resulting from BAMM), and I have probabilities of the traits at each node and tip. However, I want to correct for the phylogenetic dependence of both rates and traits. Functions in R using PGLS seem to build a variance co-variance matrix based on data at the tips of the trees, but it seems not possible to assign data to the nodes of the tree. Does anyone know how to do this, or another way in which one can correct for the autocorrelation between nodes in a phylogeny when using linear models? Thanks a lot, Renske -- Renske Onstein PhD student University of Zurich Institute of Systematic Botany Zollikerstrasse 107 8008 Zurich Switzerland onstei...@gmail.com renske.onst...@systbot.uzh.ch [[alternative HTML version deleted]] ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/ _ Dan Rabosky Assistant Professor Curator of Herpetology Museum of Zoology Department of Ecology and Evolutionary Biology University of Michigan Ann Arbor, MI 48109-1079 USA drabo...@umich.edu http://www-personal.umich.edu/~drabosky http://www.lsa.umich.edu/ummz/ [[alternative HTML version deleted]] ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/
Re: [R-sig-phylo] autocorrelation nodes in phylogeny
As a quick follow up, I would add that it is not necessarily the case that your Type I error rates will be inflated if you use (say) a phylogenetic error model. It depends on your sampling. If you run BAMM on a phylogenetic tree of X species, then test a trait-diversification correlation on a greatly reduced dataset (say, 0.1 * X species) using those BAMM rate estimates, you will often find that error rates are not appreciably elevated. In effect, subsampling your datasets like this (using the subtreeBAMM function) can greatly decrease the mean covariance in rates between species that is attributable to the BAMM model itself and lead to acceptable statistical properties. We've done this and found that it behaved acceptably for our purposes, but this is a bit of an ad hoc solution and you will need to assess whether it's appropriate for your data. ~Dan Rabosky On Jun 9, 2014, at 9:29 AM, Renske Onstein wrote: Dear all, I would like to test the effect of a set of (binary+continuous) traits simultaneously on diversification rates using linear models. I have diversification rates linked to each node and tip in a phylogeny (resulting from BAMM), and I have probabilities of the traits at each node and tip. However, I want to correct for the phylogenetic dependence of both rates and traits. Functions in R using PGLS seem to build a variance co-variance matrix based on data at the tips of the trees, but it seems not possible to assign data to the nodes of the tree. Does anyone know how to do this, or another way in which one can correct for the autocorrelation between nodes in a phylogeny when using linear models? Thanks a lot, Renske -- Renske Onstein PhD student University of Zurich Institute of Systematic Botany Zollikerstrasse 107 8008 Zurich Switzerland onstei...@gmail.com renske.onst...@systbot.uzh.ch [[alternative HTML version deleted]] ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/ _ Dan Rabosky Assistant Professor Curator of Herpetology Museum of Zoology Department of Ecology and Evolutionary Biology University of Michigan Ann Arbor, MI 48109-1079 USA drabo...@umich.edu http://www-personal.umich.edu/~drabosky http://www.lsa.umich.edu/ummz/ [[alternative HTML version deleted]] ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/
[R-sig-phylo] BAMM: Bayesian analyses of macroevolutionary rates
R-sig-phylo community: We announce the release of BAMM and BAMMtools, a Bayesian framework for the analysis of speciation, extinction, and trait evolution on phylogenetic trees (www.bamm-project.org). BAMM is oriented entirely towards detecting and quantifying heterogeneity in evolutionary rates. It uses reversible jump Markov chain Monte Carlo to automatically explore a large number of candidate models of lineage diversification and trait evolution. BAMM is a command line program written in C++. We have also created BAMMtools, an R package for the analysis and visualization of BAMM output. The package is available for installation via CRAN or through the project website. BAMM / BAMMtools functionality includes a number of novel methods for visualization and analysis of complex macroevolutionary dynamics, including: - Inference on evolutionary rate variation through time and among clades - Visualization of dynamic rates on phylogenetic trees with BAMMtools - Estimation of credible sets of macroevolutionary rate configurations - Estimation of the best (maximum a posteriori probability) macroevolutionary rate configuration - Bayes factors for inferring numbers of evolutionary rate regimes on phylogenies - Summarize marginal distributions of evolutionary rates for individual clades - Analyze diversification rates with incomplete and phylogenetically non-random taxon sampling The documentation includes a graph gallery illustrating some of data visualizations and analyses that can be done out of the box with BAMMtools (www.bamm-project.org/bammgraph.html). A conceptual discussion how the BAMM framework differs from stepwise AIC-based approaches can be found at http://bamm-project.org/rateshifts.html. A complete description of the general model that underlies BAMM (and a performance evaluation) is available at: ** Rabosky, DL. 2014. Automatic detection of key innovations, rate shifts, and diversity-dependence on phylogenetic trees. PLoS ONE. http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0089543 *** If you previously installed BAMM or BAMMtools, please download the latest versions. The code for both has changed during the past month and we cannot guarantee compatibility. ~Dan Rabosky _ Dan Rabosky Assistant Professor Curator of Herpetology Museum of Zoology Department of Ecology and Evolutionary Biology University of Michigan Ann Arbor, MI 48109-1079 USA drabo...@umich.edu http://www-personal.umich.edu/~drabosky http://www.lsa.umich.edu/ummz/ ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/
Re: [R-sig-phylo] PGLS from taxonomy and computational time
Hi Matthew- Rob Freckleton had a short paper last year demonstrating the calculations for PGLS (using Felsenstein's 1973 algorithm) without the need for numerically challenging (perhaps impossible in this case) matrix inversion for a dataset of this size. http://onlinelibrary.wiley.com/doi/10./j.2041-210X.2012.00220.x/abstract Some example code is provided in the appendix (this might even be implemented in Caper or a similar package now). My guess is that this will solve your problems (see Fig 1 from the paper, for example). ~Dan Rabosky On Nov 11, 2013, at 4:16 PM, Matthew Leo Knope wrote: Dear all, I am attempting to run a PGLS in R (please see code below) where I am accounting for phylogenetic affinity (via taxonomy) to the class level and then testing for the potential relationship between 3 basic ecological axes (habitat tiering, motility level, and feeding mode) on body size in ~17K genera of marine fossil bilaterian animals (including inverts, fishes, reptiles, and mammals). However, the computational time for calculating the PGLS is prohibitive (crashes my computer every time and is now running 6 days on a high performance cluster and yet to finish). Is there a way to speed this up somehow or perhaps a better way to do this altogether? Many thanks in advance, Matthew Knope all-read.csv(pgls_all.csv) library(nlme) library(MuMIn) library(arm) library(ape) #log transform loglength-log(all$max_length) #convert numeric to factors tier-factor(all$tiering) mot-factor(all$motility) feed-factor(all$feeding) #to build tree from taxonomy alltree=as.phylo(~phylum/class, data=all) #gives the tree random branch lengths (will try a bunch to see how it impacts the results) alltreerand - compute.brlen(alltree,method=Grafen) #first compute correlation matrix from tree cormatrix-corPagel (1,alltreerand,fixed=FALSE) #to run pgls with tree and random branch lengths pgls_length- gls(loglength~tier+mot+feed, data=all, correlation=cormatrix, method=ML) summary(pgls_length) confint(pgls_length) [[alternative HTML version deleted]] ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/ _ Dan Rabosky Assistant Professor Curator of Herpetology Museum of Zoology Department of Ecology and Evolutionary Biology University of Michigan Ann Arbor, MI 48109-1079 USA drabo...@umich.edu http://www-personal.umich.edu/~drabosky http://www.lsa.umich.edu/ummz/ [[alternative HTML version deleted]] ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/
Re: [R-sig-phylo] Substitute for functions on the laser package
Hi all- Thanks for calling these issues to my attention. An updated version of laser should be on cran within the next few days (I just uploaded the package). Many thanks in particular to Klaus Schliep for fixing a number of dependency issues and for improving several of the functions. Klaus, I apologize for missing your original email that included your fixes (I was in the field without email for an extended period of time earlier this year and missed a number of messages that arrived during that time). A comprehensive revision of the package is underway, but the version I just uploaded largely maintains the functionality of the package as it existed before being kicked off of CRAN a few months ago. ~Dan Rabosky On Sep 4, 2013, at 6:20 AM, Emmanuel Paradis wrote: Hi Mariana, You may try the function bd.time in ape. The companion paper gives examples similar to what laser does. Best, Emmanuel -Original Message- From: Mariana Vasconcellos marian...@utexas.edu Sender: r-sig-phylo-boun...@r-project.org Date: Tue, 3 Sep 2013 23:20:17 To: Liam J. Revellliam.rev...@umb.edu Cc: r-sig-phylo@r-project.org Subject: Re: [R-sig-phylo] Substitute for functions on the laser package Thanks, Liam! But, unfortunately I don't know how to build from source. I downloaded Xcode and the laser_2.3.tar.gz. But, I have no idea of how to install from source. If someone could help with any advise, that would be great! Also, does anyone have another option to calculate decreasing speciation rate, increasing extinction rate or both using a different package? Thank you very much! -- Mariana Vasconcellos Ecology, Evolution Behavior Integrative Biology The University of Texas at Austin On Sep 3, 2013, at 10:05 PM, Liam J. Revell liam.rev...@umb.edu wrote: Hi Mariana. You can download old versions of laser from the CRAN archive (http://cran.r-project.org/src/contrib/Archive/laser/); however you will have to build from source. This is easy if you have Xcode (for Mac OS) or a gcc compiler installed. If you do not, and cannot figure out how to install from source, then respond to the list I'm sure someone will help out post a package binary for you. - Liam Liam J. Revell, Assistant Professor of Biology University of Massachusetts Boston web: http://faculty.umb.edu/liam.revell/ email: liam.rev...@umb.edu blog: http://blog.phytools.org On 9/3/2013 10:39 PM, Mariana Vasconcellos wrote: Dear all: I just performed a test of the gamma-statistic on my tree and I found that diversification is not constant in time. So, I would like to perform the function fitSPVAR, fitEXVAR and fitBOTHVAR using the laser package. But, I just saw that the laser package was removed from the CRAN repository. Could anyone tell me what other alternative package I could use to develop and test models of decreasing speciation rate, increasing extinction rate or both? Does MEDUSA do this sort of analyses? Thank you for your help! Mariana [[alternative HTML version deleted]] ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/ [[alternative HTML version deleted]] ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/ ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/ _ Dan Rabosky Assistant Professor Curator of Herpetology Museum of Zoology Department of Ecology and Evolutionary Biology University of Michigan Ann Arbor, MI 48109-1079 USA drabo...@umich.edu http://www-personal.umich.edu/~drabosky http://www.lsa.umich.edu/ummz/ [[alternative HTML version deleted]] ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/
Re: [R-sig-phylo] asymmetric transitions
. Allman and J. A. Rhodes, The Identifiability of Covarion Models in Phylogenetics, IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol. 6, no. 1, pp. 7688, Jan. 2009. On Aug 16, 2012, at 10:07 PM, Matt Pennell wrote: correction: the last sentence should have read I wonder how that would work in this case. I think these are important questions going forward. On Thu, Aug 16, 2012 at 11:00 PM, Matt Pennell mwpenn...@gmail.com wrote: Hey all, This has been a really fantastic discussion. Mark, you make some really excellent points in response to my earlier comments. I think you are correct in this. The question that arises out of Jarrod and Dan's simulations (which I have just run) is whether a model selection criteria would be able to distinguish MK from the threshold model that Felsenstein (and Wright before him) put forth? And how do we best assess model adequacy? Carl Boettiger and company (2012: Evolution) suggested a Phylogenetic Monte Carlo approach for continuous characters. I wonder how that would before I think these are important questions going forward. thanks again, matt On Thu, Aug 16, 2012 at 10:43 PM, Dan Rabosky drabo...@umich.edu wrote: Hi all- A couple of points. I am actually less concerned about the Type I error rates I gave in that previous message for the equal rates markov process, even though I think they are real (e.g., I can corroborate them using Diversitree). I don't think it is an issue of ascertainment bias, but I think Mark may be right about the LRT being inappropriate with few events on the tree and this may well explain the matter. This is probably worth exploring further. However, I am much more concerned about Jarrod's second model (with underlying continuous latent variable). This seems to be a serious problem, and if you simulate under the latent model, I think he is right that Type I error rates are really, really high. The model is reasonable: there is a continuous trait that influences the probability of observing a particular tip state. In a practical sense, this probably means the following: (i) some clades in the tree will essentially be fixed for the character in question. (ii) Other clades will appear to have high lability of the character. The clades that are fixed will be those clades where the underlying threshold character (e.g., mean clade value) drifts towards -Inf (or +Inf). Regardless of whether we think about this latent model, this at least leads to an interesting - and probably quite relevant - form of model misspecification. The model essentially induces some extra heterogeneity in rates, such that some clades will appear to be switching quickly and others slowly. However, it is still a symmetric model of sorts. You can simulate data easily under this model and verify (using whatever software) that it is a problem. I'm attaching code that does this. You can play around with 3 parameters: (i) the number of taxa in the analysis (set to 100); (ii) the expected variance of the continuous latency factor (from roots to tip); and (iii) the root state. These parameters are NTAXA, tipvar, and root in the code below. I'm keen to see what others think, but it looks to me like you can simulate very reasonable-looking datasets and obtain extremely strong support for an asymmetric model - even though the model is quasi-symmetric. So, if these hold, then I think this is a serious issue - nothing we routinely do in the analysis of discrete characters is designed to detect this sort of model misspecification. ## A single simulation library(diversitree); library(geiger); library(mvtnorm); NTAXA - 101; # Generate the tree: x - birthdeath.tree(b=1, d=0, taxa.stop=NTAXA); x - drop.tip(x, x$tip.label[x$edge[,2][x$edge.length==0][1]]); vv - vcv.phylo(x); # get phylogenetic vcv matrix # Now we set the expected variance at the tips: # e.g., the value we want for the diagonal of the vcv matrix # If this is = 1, you'll have a phylogenetic standard normal distribution # tipvar - 2; sf - tipvar/max(vv); #get scale factor for vcv matrix vmat - vv*sf; # scale matrix root - 0; #root state: this assumes the root is equally likely to give either state mu - rep(root, nrow(vmat)); #vector of means # Simulate continuous, and then discrete, chars from # the corresponding mvn and binomial distributions chars - rmvnorm(1, mean=mu, sigma=vmat); states - rbinom(length(chars), 1, prob=plogis(chars)); names(states) - x$tip.label; # Look at the data... plot(x, show.tip.label=F); tiplabels(pch=21, bg = c('black', 'white')[(states+1)], col='black', cex=1); Using Diverstree for model fitting: lfx - make.mk2(x, states); # The asymmetric likelihood function lfxcon - constrain(lfx, formulae = list(q01 ~ q10)); #constraining q01 ~ q10 # Estimation... l2
Re: [R-sig-phylo] asymmetric transitions
HI Jarrod- It isn't immediately obvious to me why the exercise below reflects something problematic. In the first scenario, you have a random binary state but with strong differences in frequency. Because there is effectively no phylogenetic signal (as data are simply random), this suggests a high rate of transition between states. I think that such asymmetry in frequencies would be highly unlikely under a symmetric model of character change regardless of the root state. I think it is useful to think about whether a symmetric process could have given rise to a truly random distribution of tip data with strong frequency differences - my intuition is that this is highly unlikely. However, I would be happy to know what others think. Cheers, ~Dan On Aug 16, 2012, at 10:09 AM, Jarrod Hadfield wrote: Hi, I have had a few replies off-list which have made me try and clarify what I mean. I think the distinction needs to be made between two types of probability: the probability that an outcome is 0 or 1 Pr(y| \theta) and the probability density of \theta, Pr(\theta | \gamma), indexed by parameter(s) \gamma. It seems to me that in order to make the problem identifiable an informative prior (or an assumption) is required for the root state. It seems to me that the strong prior Pr(\theta=0.5|\gamma) =1 is used, and then justified because int Pr(y=0| \theta)Pr(theta)dtheta=int Pr(y=1| \theta)Pr(theta)dtheta=0.5. A non-informative prior distribution for \theta could be U(0,1). This also induces a prior on y of the same form, int Pr(y=0| \theta)Pr(theta)dtheta=int Pr(y=1| \theta)Pr(theta)dtheta=0.5, but that is not the main motivation for choosing U(0,1). For example, is this not worrying: library(ape) n-100 tree-rcoal(n) # random tree y-rbinom(n, 1, 0.2) # random data unconnected to the tree m1-ace(y, tree, type = d, model=SYM) m2-ace(y, tree, type = d, model=ARD) anova(m1, m2) # asymmetric evolutionary transition rates strongly supported y-rbinom(n, 1, 0.5) # random data unconnected to the tree but p=0.5 m1-ace(y, tree, type = d, model=SYM) m2-ace(y, tree, type = d, model=ARD) anova(m1, m2) # asymmetric evolutionary transition not supported Cheers, Jarrod Quoting Jarrod Hadfield j.hadfi...@ed.ac.uk on Thu, 16 Aug 2012 12:30:30 +0100: Hi, I have been helping someone with some analyses and came across some routines to estimate asymmetric transition rates between discrete characters. This surprised me because its fairly straightforward to prove that asymmetric transition rates cannot be identified using data collected on the tips of a phylogeny. However when I run these routines (e.g. ace) likelihood ratio tests often suggest strong support for asymmetric rates. This seems to arise because there is an implicit (and unjustified) prior point mass on the ancestral state *probability*. If anyone could point me to reference that states this that would be great? I really don't want to be saying we have support for processes that we need a fossil record, not just a phylogeny, to understand. Cheers, Jarrod -- The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo -- The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo [[alternative HTML version deleted]] ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Re: [R-sig-phylo] asymmetric transitions
Hi All- This is an interesting discussion. I think there is clearly something going on. I do not get catastrophic Type I error rates from this exercise (only 'elevated') with discrete char simulations (an equal rates markov process) - see below. However, Jarrod's latent model seems reasonable and probably a decent approximation of many real phenomena, so it is extremely worrying that Type I error rates would be so high. Also, I don't see why the BiSSE framework would be immune to this problem. If the true process of character change (regardless of its relationship to speciation and extinction) occurs with a latent/liability model, it seems that BiSSE would also be affected. This is something to think (and worry) about. #Simulations using pure birth tree with 100 tips; birthrate = 1.0. # pure birth used as these branch lengths are more consistent with those in a typical interspecific dataset #Rate = 0.01 Mean frequency of root state in pure birth tree of 100 tips: 0.930 Type I error rate: 0.12 # Rate = 0.05. Frequency of root state: 0.86 Type I error rate: 0.08 # Rate = 0.1. Frequency of root state: 0.77 Type I error rate: 0.08 #Rate = 1.0 Frequency of root state: 0.501 Type I error rate: 0.03 #Rate = 10 Frequency of root state: 0.49 Type I error rate: 0.04 #Simulation code library(geiger); # quick fix to deal with tree simulations in Geiger, eg birthdeath.tree # stops at exactly N taxa, but this gives a pair of 0 length branches: x - birthdeath.tree(b=1, d=0, taxa.stop=101); x - drop.tip(x, x$tip.label[x$edge[,2][x$edge.length==0][1]]); # We drop one of the tips with 0 length branches... ## Set up rate matrix: rate1 -1; qmat - list(rbind(c(-rate1, rate1), c(rate1, -rate1))); #FAST rate matrix ## PLot a tree with character states, just to get a feel for phylogenetic ## signal in the character state data: sim -sim.char(x, qmat, model=discrete, n=1, root.state=1); cvec - c('black', 'white'); plot.phylo(x, show.tip.label=F); tiplabels(pch=21, col='black', bg=cvec[sim], cex=0.9); NSIMS - 100; fdif - numeric(NSIMS); # vector for character frequencies pvec - numeric(NSIMS); # vector for pvalues for (i in 1:NSIMS){ cat(i, '\n'); sim -sim.char(x, qmat, model=discrete, n=1, root.state=1); # make sure at least both states are observed in dataset: while (sum(sim==1) == length(sim)){ sim -sim.char(x, qmat, model=discrete, n=1, root.state=1); } tx -table(sim); fdif[i] - tx['1'] / sum(tx); m1 - ace(sim, x, type = 'd', model ='SYM'); m2 - ace(sim, x, type = 'd', model = 'ARD'); lrt - -2*m1$loglik + 2*m2$loglik; pvec[i] - 1 - pchisq(lrt, 1) } mean(fdif); sum(pvec = 0.05); On Aug 16, 2012, at 11:04 AM, Matt Pennell wrote: Jarrod and Dan, While I see what Dan is saying and I agree that evaluating this with non-phylogenetic data is not entirely useful, I think you have stumbled upon a known issue but one that is not widely appreciated. While the MK model is a useful model for discrete characters in many ways, it may be inappropriate for such a test. If you assume that the root is equally likely to be in either state with a lot of tips (i.e. approaching the limit), the ML estimate for the ratio of q01 (transition rate from 0 to 1) to q10 will converge on the ratio between number of tips in state 1 to number of tips in state 0. So if you have a tree where most of the tips are in state 1 then you will get support for the asymmetric change model as this best explains the data. It is a very weird problem and perhaps I am incorrect in this. Regardless, this result does not hold if the discrete character is modeled simulataneously with the branching process of the phylogeny (i.e. BiSSE; Maddison et al. 2007). So, in summation, I mostly agree with you. But this is not shocking behavior of the model. If you are interested, a Bayesian implementation of the Mk model can be found in the package diversitree in the function make.mk. again, i could be off base here. curious to see what others think? matt On Thu, Aug 16, 2012 at 10:58 AM, Dan Rabosky drabo...@umich.edu wrote: HI Jarrod- It isn't immediately obvious to me why the exercise below reflects something problematic. In the first scenario, you have a random binary state but with strong differences in frequency. Because there is effectively no phylogenetic signal (as data are simply random), this suggests a high rate of transition between states. I think that such asymmetry in frequencies would be highly unlikely under a symmetric model of character change regardless of the root state. I think it is useful to think about whether a symmetric process could have given rise to a truly random distribution of tip data with strong frequency differences - my intuition is that this is highly unlikely. However, I would
Re: [R-sig-phylo] asymmetric transitions
:50 to favor the ARD model over the Mk model. In fact, the simulation model is equivalent the ARD model with a rate of evolution approaching infinity, so we should prefer the ARD model. 3. (in reference to Matt's post) In the ARD model, it is possible for the MLE of the equilibrium state frequency of the less frequent state to be 0.5. Presumably, this is a rare occurrence, but I don't agree with the characterization of the ratio of parameters converging to the ratio of states. Consider a clade with lots of tips in state 1 but tiny branch lengths. If this clade is found in the context of a tree with a few other branches that are long and lead to tips with state 0, then you can get an MLE of the state frequency for state 0 being 0.5. Most of the tips will have state 1, but because they are easily explained by one transition to 1 you can still infer that the less frequent state has a higher equilibrium frequency. Perhaps, I'm mis-reading what Matt is referring to when he discusses an analysis of a tree with a lot of tips (ie. approaching the limit). I do agree that if you simulate a very large tree under ARD (with the frequencies not equal to 0.5), then the frequency of the states at the tips will converge to the equilibrium state frequencies. With respect to Dan's results: The Type I error rate of 0.12 troubles me. Have you tried exporting the data and seeing if other software agrees with the likelihood ratios returned by ace() ? I looked at the code for ace and nothing looked amiss to me (though my R skills are virtually non-existent). If the result is corroborated by other software, then my best guesses would be: 1. ascertainment bias (the simulation model clearly excludes constant patterns, but I don't believe that the inference model in ace does any correction for this), and/or 2. the assumption that you can use the chi-square as the null distribution for the LRT probably breaks down when you have very few events on the tree. In some sense the number of events is our measure of the amount of data, and when we have very few events on the tree the asymptotic behavior of the LRT under the null is probably not going to help us. In the limiting case, when rates of character change are so low that you never see homoplasy, then I think the LRT of the the two models should get close to 1 on virtually any realization (conditional on starting in state 1 and having exactly 1 change on the tree, both model make the same predictions about the data; so in this realm the data should not prefer one model over the other). So, I'm not sure how the small data explanation would explain your observation of an excess of large LRT statistics. those are my 2 cents. all the best, Mark _ Dan Rabosky Assistant Professor Dept of Ecology and Evolutionary Biology Museum of Zoology University of Michigan drabo...@umich.edu ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Re: [R-sig-phylo] read.dna warnings and pitfalls
Hi Emmanuel- Thanks for fixing the whitespace issue. I think this fix will be useful to many users. On the issue of recognizing 10 IUPAC characters: I think this is a real problem, and may come up again in short order. Maybe it is just that use of this function has been limited? In the single dataset with a modest number of sequences that caused me problems yesterday, I had the following species and/or genus names - all of which constitute 10 character strings drawn from the set of IUPAC codes: brachyurus (x 2) savannarum graduacauda caudacutus Camarhynchus (x 3) madagascariensis I don't suggest deprecating the phylip sequential, but rather, using something that is compatible with raxml (surely one of the most widely used phylogenetics programs today). I think raxml uses a relaxed sequential version of the phylip format with whitespace delimitation. I could read the same alignment in raxml with no problems, but I had multiple issues when reading the same file with read.dna (including the whitespace character on the first line). My guess is that very few people are using the original phylip format, with its limit of 10 characters per taxon name, and with dna seqs beginning immediately after this. So maybe deprecate sequential phylip, but you could use what Stamatakis calls relaxed sequential PHYLIP, which appears to be: (1) taxon names cannot include spaces but can be up to 100 characters; and (2) names separated from sequences by whitespace character (ideally, this should recognize any number of spaces or tabs to prevent user confusion). For users with tab-delimited raxml files (eg each taxon name separated from its dna sequence by a tab), you can use a regular-expressions enabled text editor (like textwrangler) to quickly find potential problems. Just search for [ACGTUMRWSYKVHDBN]{10}.+\t with grep matching enabled. Cheers, ~Dan On Apr 26, 2012, at 2:16 AM, Emmanuel Paradis wrote: Hi Dan, The reason for this implementation (searching the first 10 IUPAC-coded bases) is because the exact formatting is not inconsistent among different programs. Some files have: 0123456789acgt. that is a 10-character name and the sequence starting on the 11th position. I think this is typical for Phylip. Other software (e.g., PhyML) accepts longer taxa names and require a space before the start of the sequence. About your example: it depends on the order of the data. The following file can be read: 2 10 x AA madagascarAA But if you invert the two sequence lines, it fails. It is the first time I hear about this problem in 9 years, maybe because it requires a particular combination of circumstances. Another drawback of this implementation is that files with less than 10 bases cannot be read. How to solve this? If it were left only to me, I would deprecate the interleaved and sequential formats. FASTA is more flexible, more widespread, easier to parse, can store exactly the same information, and labels are only constrained to be on a single line (but can contain any characters including \n, \t, ...) But I guess many programs use the Phylip formats, so I'd be glad to read other suggestions. As for your 2nd problem, it is now fixed in ape. Best, Emmanuel -Original Message- From: Dan Rabosky drabo...@umich.edu Sender: r-sig-phylo-boun...@r-project.org Date: Wed, 25 Apr 2012 17:51:35 To: r-sig-phylo@r-project.org Subject: [R-sig-phylo] read.dna warnings and pitfalls Hi All- I have spent an inordinate and embarrassing amount of time tracking down an excruciatingly cryptic issue with read.dna, which I rarely use. Here are two key problems: 1) The function automatically assumes it is reading DNA sequences when it encounters a string of 10 continuous DNA-like characters. This includes all characters in the set (ACGTUMRWSYKVHDBN-). This function, unlike the phylip original, does not have limits on taxon name lengths. Hence, I had - in the middle of a large alignment - a species whose name included the string MADAGASCAR, which caused a failure. To be fair, the documentation warns of this, but I think this is extremely easy to overlook, and - moreover - it seems unfortunate to have to parse all your taxon names for a potential IUPAC match before trying to use the function. Presumably, most users who specify sequential spacing will be using whitespace to separate taxon names from DNA sequences, and perhaps it is better to exploit this rather than IUPAC matching. 2) The function is whitespace-sensitive. if you tab-separate the numbers on the first line (numbers of taxa, numbers of sites), you'll receive an errror with the message: the first line of the file must contain the dimensions of the data. It appears that spaces are OK, however. Hopefully this post will be useful to somewhere in the future with a similar issue. Perhaps these can be addressed
[R-sig-phylo] read.dna warnings and pitfalls
Hi All- I have spent an inordinate and embarrassing amount of time tracking down an excruciatingly cryptic issue with read.dna, which I rarely use. Here are two key problems: 1) The function automatically assumes it is reading DNA sequences when it encounters a string of 10 continuous DNA-like characters. This includes all characters in the set (ACGTUMRWSYKVHDBN-). This function, unlike the phylip original, does not have limits on taxon name lengths. Hence, I had - in the middle of a large alignment - a species whose name included the string MADAGASCAR, which caused a failure. To be fair, the documentation warns of this, but I think this is extremely easy to overlook, and - moreover - it seems unfortunate to have to parse all your taxon names for a potential IUPAC match before trying to use the function. Presumably, most users who specify sequential spacing will be using whitespace to separate taxon names from DNA sequences, and perhaps it is better to exploit this rather than IUPAC matching. 2) The function is whitespace-sensitive. if you tab-separate the numbers on the first line (numbers of taxa, numbers of sites), you'll receive an errror with the message: the first line of the file must contain the dimensions of the data. It appears that spaces are OK, however. Hopefully this post will be useful to somewhere in the future with a similar issue. Perhaps these can be addressed in a future update to ape? -Dan Rabosky ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
[R-sig-phylo] read.dna warnings and pitfalls
Hi All- I have spent an inordinate and embarrassing amount of time tracking down an excruciatingly cryptic issue with read.dna, which I rarely use. Here are two key problems: 1) The function automatically assumes it is reading DNA sequences when it encounters a string of 10 continuous DNA-like characters. This includes all characters in the set (ACGTUMRWSYKVHDBN-). This function, unlike the phylip original, does not have limits on taxon name lengths. Hence, I had - in the middle of a large alignment - a species whose name included the string MADAGASCAR, which caused a failure. To be fair, the documentation warns of this, but I think this is extremely easy to overlook, and - moreover - it seems unfortunate to have to parse all your taxon names for a potential IUPAC match before trying to use the function. Presumably, most users who specify sequential spacing will be using whitespace to separate taxon names from DNA sequences, and perhaps it is better to exploit this rather than IUPAC matching. 2) The function is whitespace-sensitive. if you tab-separate the numbers on the first line (numbers of taxa, numbers of sites), you'll receive an errror with the message: the first line of the file must contain the dimensions of the data. It appears that spaces are OK, however. Hopefully this post will be useful to somewhere in the future with a similar issue. Perhaps these can be addressed in a future update to ape? -Dan Rabosky ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Re: [R-sig-phylo] birthdeath {ape} fails every time.
If you want simple fitting of a constant-rate birth-death process to a phylogeny, another option is in diversitree (see make.bd). Last time I checked, bd in laser (and birthdeath in ape) used transformations that kept mu lambda, but this is not the case for diversitree. The package is well-maintained. Treepar might also have something like this as well. Diversitree also allows you to incorporate incomplete taxon sampling. With diversitree, this would basically just be: myLikelihoodFunction - make.bd(myTree, sampling.f = 1.0) pars_init - c(0.1, 0.05) find.mle(myLikelihoodFunction, pars_init) ~Dan Rabosky On Apr 19, 2012, at 5:02 PM, Matt Pennell wrote: Simon, For reasons that I do not fully understand, the problem arises with the use of a coalescent tree (which has a very branch length distribution than a tree generated under a birthdeath process). maybe someone else has some insight into why this causes birthdeath to fail. if you run library(geiger) xx - birthdeath.tree(1, 0, taxa.stop=100) birthdeath(xx) it works. However it does produce warning messages generated in the optimization procedure (which should not be cause to worry; laser's equivalent function just suppresses these warnings). if you really want to use a coalescent tree you can still use library(laser) yy - rcoal(100) bd(branching.times(yy)) hope this helps. matt On Thu, Apr 19, 2012 at 4:52 PM, Simon Greenhill s.greenh...@auckland.ac.nz wrote: Morning all, Whenever I try to use ape's (v 3.0-2) birthdeath function e.g. birthdeath(rcoal(sample(10:100, 1))) I get the following error: Error in while (foo(up) 0) up - up + inc : missing value where TRUE/FALSE needed The warnings all appear to be this sort of thing: 1: In log(1 - a) : NaNs produced 2: In log(exp(r * x[2:N]) - a) : NaNs produced 3: In nlm(function(p) dev(p[1], p[2]), c(0.1, 0.2), hessian = TRUE) : NA/Inf replaced by maximum positive value and seem to be generated from this line: out - nlm(function(p) dev(p[1], p[2]), c(0.1, 0.2), hessian = TRUE) Is there a fix/workaround for this? Kind regards, Simon ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo [[alternative HTML version deleted]] ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Re: [R-sig-phylo] Estimating MRCA dates
Hi Brian- It has been awhile since I played with chronopl(), but at the time, I convinced myself that results were substantially different from penalized likelihood as implemented in r8s. I haven't explored this exhaustively, but at the time, I was very much convinced that the results of chronopl were potentially problematic. Other folks have mentioned to me that they've observed similar differences between r8s and chronopl and that they've recovered bizarre smoothed trees from chronopl. Perhaps someone with more recent chronopl experience can weigh in on this... ~Dan On Feb 2, 2012, at 6:06 AM, Brian O'Meara wrote: Ape can do this, function chronopl. In general, one good way to find out which package can do something is by looking at the task view for phylogenetics: http://cran.r-project.org/web/views/Phylogenetics.html . This will also allow you to install all the phylogenetics packages (at least those on CRAN) with just a couple of commands. An updated version should be coming out later today (and if someone notices other things missing, please let me know). Brian ___ Brian O'Meara Assistant Professor Dept. of Ecology Evolutionary Biology U. of Tennessee, Knoxville http://www.brianomeara.info Students wanted: Applications due Dec. 15, annually Postdoc collaborators wanted: Check NIMBioS' website Funding wanted: Want to collaborate on a grant? ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Re: [R-sig-phylo] How to compute branching.time for specific node
Hi Axel- You can pull out the node number as you note below: mynode - mrca(treefile)[taxon1, taxon2] and then pull out the relative branching time as branching.times(treefile)[as.character(mynode)] using the as.character() tells R to treet the node number as a name attribute, and since the branching.times() vector has names, it should give you the relevant time. I'm not sure what you mean by append this command, but it sounds like you want to do this to a bunch of trees in a file (?). I'd write a function, read in all your trees, and then use lapply. E.g: trees - read.tree('mytrees.tre') myFx - function(phy, tx1, tx2){ mynode - mrca(treefile)[ tx1, tx2 ] bt - branching.times(phy)[as.character(mynode)] return(bt) } mytimes - lapply( trees, myFx, tx1 = taxon1, tx2 = taxon2 ); #note that using lapply returns a list unlisted.times - unlist(mytimes); #convert 'mytimes' to a vector Cheers, ~Dan On Oct 1, 2011, at 10:53 PM, Schoenhofer, Axel wrote: Hi everybody, I try to wrap my head around how to get a node age from a set of dated ultrametric tree and only for a specific node, to be specified by tip taxa. I'd further like to write them to a file, each value to a new line, to get them in Excel or use them furthe r in R. The following comands I found usefull for my task but could not pierce them together: mrca(treefile)[taxon1,taxon2] # will return the respective node number. I need this as it will give me a node regardless where taxa are in the tree. branching.times(treefile)# problem is that I get a list of all nodes and branching times and I would like to specify it for only the mrca-node. I read something about specifying node.names to use them in combination with branching.times but could neither of those get to work. How would I append this command to a consecutive Newick trees in a file? Thanks so much for any suggestion. Axel [[alternative HTML version deleted]] ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Daniel L Rabosky Department of Integrative Biology University of California, Berkeley Berkeley, CA 94720 [starting 2012] Assistant Professor Department of Ecology and Evolutionary Biology University of Michigan Assistant Curator of Herpetology, University of Michigan Museum of Zoology [[alternative HTML version deleted]] ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Re: [R-sig-phylo] fitContinuous in geiger
Hi Carl, Annemarie- While it is possible in principle that Annemarie's results reflect true ML estimate of lambda = 1, I think the practical reason this is occurring is that the default bounds on lambda in fitContinuous are 1e-7 and 1. Because optimization in fitContinuous uses a bounded BFGS algorithm, hard bounds are imposed at those values. You could increase the bounds to lambda 1, and it is possible for lambda to exceed 1. This may not make sense, and it is also possible that the optimization will fail when it tries a lambda 1, because enabling lambda to exceed 1 may result in an NA for the likelihood (Liam has provided some earlier discussion of this issue). But the important point is that no matter what the ML value of lambda, you are limited to those default bounds as specified in fitContinuous - unless you explicitly increase them! This is also relevant to other parameters associated with other trait evolution models, so it is good to ask questions if you observe behavior like thisfor lambda, delta, or anything else. It could even happen to the Brownian rate parameter if the phylogeny branch lengths are small relative to trait variance, because the default upper bound is 20. Cheers, ~Dan On May 23, 2011, at 11:46 AM, Carl Boettiger wrote: Hi Annemarie, No problem, tried to give some answers below. On Mon, May 23, 2011 at 8:05 AM, Annemarie Verkerk annemarie.verk...@mpi.nl wrote: Dear Carl, Liam, and others, thanks for your explanation of what went wrong in the fitContinuous calculations. I set beta to a large number (100) in order to stop it from reading the maximum value. Then, I got exactly the same results for lambda with the non-multiplied and the multiplied data. Okay, I still have two more problems - probably they will sound stupid to you, but I am still very much a newbie to this and if it suffices just to point me to other sources where this has already been discussed please do so! the log likelyhood: between the non-multiplied and the multiplied data, there is still a difference. Liam, you write 'if the variance of this distribution is small, the value of the function will be large (i.e., greater than 1.0)'. However, the variance between the non-multiplied and the multiplied data is exactly the same. So why should log likelyhood values change when data is multiplied? Why do I get a normal looking value (around -200) (with the beta scale set for a very large scale) for the multiplied data? And why does the log likelyhood become so big (-2000) when the initial maximum beta value of 20 is reached if the scale is (0,20)? Liam is referring to the variance of the likelihood distribution, not the variance of the traits. If the diversification rate is high, then any particular outcome is very improbable, so its probability density has high variance and corresponding low value for the prob density at any particular value. Hence the large negative log-likelihood for large beta. (Recall log lik around -2000 means a density of exp(-2000), which is very near zero, illustrating why we need to use logs!) the lambda scores: now, my lambda scores for the non-multiplied and the multiplied data are the same. However, most of them (999 out of 1000) are '1'. To my understanding, '0' and '1' are theoretical limits for lambda that one normally does not reach. So I'm afraid I'm still unsure why this happens, and what it means. You raise an important point here. Just because they are the boundary values, does not imply that these theoretical limits are uncommon -- in fact one may expect to hit the limits more often than any other value. Numerical optimization will often push estimates of a parameter all the way to its boundary. This just means that increasing (deceasing) the parameter value increases the likelihood, so it keeps doing so until it can go no further. This can result from, but does not necessarily imply, that the model is inappropriate (it is also particularly common for small numbers of taxa, for instance), and can be more common on relatively flat likelihood surfaces. So I guess my short answer is this isn't a problem and my longer answer is be suspicious whenever you get back boundary estimates, and consider bootstrapping. HTH, Carl I hope you don't mind this new response to the thread. With kind regards, Annemarie Liam J. Revell wrote: Hi Annemarie, Positive log-likelihoods are not a problem. The log-likelihood is calculated by summing the log probability densities, which come from a function that integrates to 1.0. Thus, if the variance of this distribution is small, the value of the function will be large (i.e., greater than 1.0). The phenomenon of decreasing mean lambda when you increase the scale (i.e., multiply by 10 or 100) is probably due to bounds on beta (in the lambda model, sigma^2) in fitContinuous(). The default upper bound is
Re: [R-sig-phylo] Conditioning on Total Tips in Birth-Death Trees
Hi Dave- Are there any birth-death tree functions which condition on the total number of tips (extinct and extant) on a tree rather than the number of surviving tips? You can recode the birthdeath.tree function from Geiger to do this if you want. The total number of species in the tree (living plus extinct) is equal to nrow(edge)/2, where edge is the edge matrix. Find the repeat statement and replace the line: if (taxa.stop) if (sum(alive) = taxa.stop) break with the following: if (taxa.stop) if ((nrow(edge)/2) = taxa.stop) break; The new birthdeath.tree function should stop when taxa.stop lineages have been born (regardless of whether they've subsequently gone extinct). I was wondering if it was known what the expected probability distribution of branch lengths for a fully-sampled phylogeny (includes all extinct lineages) is? If you have a perfectly sampled tree, branches should be exponentially distributed with rate (lambda + mu). Events occur in the birth-death process with rate (lambda + mu), and the probability that a given event will be speciation is lambda/(lambda + mu). ~Dan I've been trying to figure out, if for a fully extinct tree with homogenous rates, whether we would expect the branch lengths of internal branches to follow an exponential distribution based on the birth parameter (and extinct terminal edges following an exponential distribution based on the death parameter) or if the branch lengths are a function of both parameters. Perhaps I'm not using the right keywords, but my literature searches haven't found the information I'm looking for. Thanks, -Dave -- David Bapst Dept of Geophysical Sciences University of Chicago 5734 S. Ellis Chicago, IL 60637 http://home.uchicago.edu/~dwbapst/ ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo [[alternative HTML version deleted]] ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Re: [R-sig-phylo] Q: Likelihood ratio test and rate shift test on complete dated phylogeny
Hi Dave- If I understand correctly, that's a separate issue than what Ted is getting at; I think he's worried about the effect of data dredging by running all possible models (as Burnham and Anderson warn against) and not mis-parametrization of the simple model. I don't think I'd call this data dredging. You are still comparing two models, but one has a shift parameter (the two-rate model) and one doesn't. Finding the maximum likelihood estimate of the shift parameter involves looking at the combined likelihood of the data partitions when the shift location is treated as a free parameter. However, assessing the *improvement in fit* of the 2 rate model over the simple 1 rate model is the trickier issue and may require simulation under the null hypothesis to evaluate correctly, as Joseph points out. ~Dan Data-dredging will bias us toward finding complex models that fit overly well. The common solution in data mining is to subset your data and thus confirm a preferred model for one subset by testing the preference in another subset, but I don't think that can be applied to studies with a single tree. -Dave On Tue, Mar 22, 2011 at 12:37 PM, Dan Rabosky drabo...@berkeley.edu wrote: Hi Ted- Others can weigh in here, but - at the very least - I would consider a model with 2 rates as having at least 1 additional parameter (the location of the rate shift). This seems to perform OK in practice. I admit that I have always been a bit uncomfortable with this, though - the null hypothesis model does not have a shift point. I think there are parallels here to finding breakpoints in segmented regression models. Here's a paper I've been meaning to read that might be relevant. Davies, R.B. (1987) Hypothesis testing when a nuisance parameter is present only under the alternative. Biometrika 74, 3343. ~Dan On Mar 22, 2011, at 10:19 AM, tgarl...@ucr.edu tgarl...@ucr.edu wrote: This can also be done using the package diversitree, with the function make.bd.split and variants thereof. You will probably have to write a function to iteratively partition the tree into all possible splits. Hi Dan, How would this approach deal with the problem of multiple comparisons, i.e., in effect testing many different hypotheses (or exploratory data mining) on the same data set and overall tree? Cheers, Ted Theodore Garland, Jr. Professor Department of Biology University of California, Riverside Riverside, CA 92521 Office Phone: (951) 827-3524 Wet Lab Phone: (951) 827-5724 Dry Lab Phone: (951) 827-4026 Home Phone: (951) 328-0820 Facsimile: (951) 827-4286 = Dept. office (not confidential) Email: tgarl...@ucr.edu Main Departmental page: http://www.biology.ucr.edu/people/faculty/Garland.html List of all Publications: http://www.biology.ucr.edu/people/faculty/Garland/GarlandPublications.html Garland and Rose, 2009 http://www.ucpress.edu/books/pages/10604.php Original message Date: Tue, 22 Mar 2011 09:44:26 -0700 From: Dan Rabosky drabo...@berkeley.edu Subject: Re: [R-sig-phylo] Q: Likelihood ratio test and rate shift test on complete dated phylogeny To: R.J. den Tex rjden...@yahoo.com Cc: r-sig-phylo@r-project.org Hi Robert- (i) Can you use a Likelihood ratio test to test whether a birth death model fits your data better than a pure birth model? (likelihood scores are obtained from LASER). Yes, this would be valid, as the models are nested. (ii) How can I test if a rate shift has occurred on a particular node/branch in a complete taxon phylogeny? There are lots of possible methods for this. Or perhaps it is better to say that there are lots of variants of a single general approach. There are the 1-rate versus 2-rate model in LASER (fitNDR_2rate, fitNDR_1rate) after Rabosky et al. (Proc. R. Soc. B, 2007); there is the MEDUSA approach, which would allow more than 1 rate shift across your tree (Alfaro et al., PNAS; this method is implemented in GEIGER). This can also be done using the package diversitree, with the function make.bd.split and variants thereof. You will probably have to write a function to iteratively partition the tree into all possible splits. Actually, I think the diversitree calculations *may* be the most accurate if you have different extinction rates across the tree. There is also the method from Moore and Donoghue (2009, PNAS) - i think tRate is the name of the package - although I don't think this is distributed on the archive for R packages. ~Dan Rabosky Thank you very much beforehand! Robert den Tex PhD student Dept. Evolutionary Biology EBC, Uppsala University Uppsala, Sweden ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo [[alternative HTML
Re: [R-sig-phylo] mntd and mpd without community data
Hi Scott- There are lots of indices to quantify tree shape differences, both in terms of topology (Colless etc) and temporal differences (e.g., Pybus and Harvey's gamma). MNTD and MPD will largely capture the temporal dimension of your trees and will be highly correlated with gamma. For example, if you have long terminal branches in your tree and with very short internal branches, gamma will be negative and both MPD and MNTD will be large. I don't think there is a right or wrong answer here and any index might be appropriate, but you will have to think hard about what exactly you want to quantify (e.g, topological imbalance or asymmetry versus branching time differences) and why. It might be good to think about standardized metrics, to control for differences in the number of taxa in trees. You may want to use several indices in concert. ~Dan Rabosky On Mar 17, 2011, at 8:45 AM, Scott Chamberlain wrote: Hello, I am curious if it is appropriate to calculate mntd (mean nearest taxon distance) and mpd (mean pairwise distance) in the picante package on trees themselves, that is, without community data. We are trying to think of informative metrics that can tell us something about tree shape among lots of different trees. Using mntd and mpd we give the functions a tree and just a vector of all 1's for the community data so that each species is equally abundant. Does this approach make sense? Are there better metrics to use given that we are just dealing with trees without community data? (I am aware of Sackin's, Colless', beta splitting, gamma, etc.) Here is a reproducible example of what I am doing: require(picante) require(ape) tree - rcoal(10) abund - rep(1, 10) names(abund) - tree$tip.label mpd_ - mpd(rbind(abund, abund), # can't have just one vector/community apparently cophenetic(tree))[1] # just one of the two numbers needed as they are the same Sincerely, Scott Chamberlain Rice University, EEB Dept. [[alternative HTML version deleted]] ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo [[alternative HTML version deleted]] ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Re: [R-sig-phylo] reverse order plotting of newick tree/phylo object
But can you explain to me what is the rationale behind this? There are only 47 nodes and 54 tips. How can the nodes from 55 to 47 than be rotated? Liam's code is rotating nodes 55+(1:47), rather than 55 to 47. The internal node indexing in ape's phylo class starts with number of tips plus 1. Thus, using rotate on nodes starting with length(phy$tip.label) + 1 means it is starting with node 56 (the root node) and going through length(phy$tip.label) + phy$Nnode, e.g.,, all internal nodes. You can see that 55+(1:47) gives a vector of node numbers that is not the same as 55 to 47. If it isn't working right - maybe it is that you have polytomies? I think you should have n-1 internal nodes (53 in your case). Try multi2di (from ape) for polytomy resolution and see if it works as desired. ~Dan Rabosky Kind regards, Thierry Thierry Janssens Postdoctoral researcher Delft University of Technology Bionanoscience Kavli Institute of Nanoscience Lorentzweg 1 2628LJ Delft the Netherlands Tel: +31 15 2781175 Fax:+31 15 2781202 e-mail: t.k.s.janss...@tudelft.nl -Original Message- From: Liam J. Revell [mailto:liam.rev...@umb.edu] Sent: donderdag 17 maart 2011 16:11 To: Thierry Janssens - TNW Cc: r-sig-phylo@r-project.org Subject: Re: [R-sig-phylo] reverse order plotting of newick tree/phylo object Hi Thierry, There might be a more elegant way to do this, but you can just apply the ape function rotate() to each node number of the tree (excluding tips). I.e. tr2-tree for(i in length(tr2$tip)+1:tr2$Nnode) tr2-rotate(tr2,i) plot(tr2) [rotate() may also be able to take a vector of nodes, but I was not able to get this to work.] - Liam -- Liam J. Revell University of Massachusetts Boston web: http://faculty.umb.edu/liam.revell/ email: liam.rev...@umb.edu blog: http://phytools.blogspot.com On 3/17/2011 10:51 AM, Thierry Janssens - TNW wrote: Dear R-sig-phylo, I am looking for a method to plot an unrooted tre/phylo object e in the reverse order (of the tip labels). Like all the nodes would have rotated. Any of you has an idea? Kind regards, Thierry Thierry Janssens Postdoctoral researcher Delft University of Technology Bionanoscience Kavli Institute of Nanoscience Lorentzweg 1 2628LJ Delft the Netherlands Tel: +31 15 2781175 Fax:+31 15 2781202 e-mail: t.k.s.janss...@tudelft.nlmailto:t.k.s.janss...@tudelft.nl [[alternative HTML version deleted]] ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo [[alternative HTML version deleted]] ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Re: [R-sig-phylo] fitContinuous in geiger: positive log-likelihoods when trait values 1
Hi Nick- Are you are getting differences in relative AICs between models from simple rescaling (multiplying by a constant)? The actual values of the traits *might* matter for optimization, depending on various parameters associated with optimization (and whatever algorithm is being used - this should be L-BFGS-B for fitContinuous, I think). So...if relative AICs are different, the first thing I would check is whether some of your model AICs reflect local optima. Do lots of optimizations with random starting parameters, which is usually sufficient - and failing that, you can get into the guts of optim and mess with the actual arguments that control the optimization (e.g., parscale etc). FYI, this is not immediately transparent in Geiger, as many of the functions are hidden. To get at the optimization, look at geiger:::fitContinuousModel which does the heavy lifting within fitContinuous, and if you need to, you can recode the relevant call to optim to have a bit more flexibility. ~Dan On Mar 7, 2011, at 4:04 PM, Nick Matzke wrote: Doh! Really should have remembered that, likelihoods-can-be-greater-than-1 is likelihood 101... I am still a little puzzled by the dramatically different results between rescaling and not, will try to post an example in a sec... [[alternative HTML version deleted]] ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Re: [R-sig-phylo] phylogenetic analysis of ratios
Hi Ted and others - Here's a further complication. The raw data are actually species' responses to different treatments, such that you have something like: (observed mass) = (species mass) + (response to treatment X) + (response to treatment Y) + (interactions) + etc. Lets say Y_i_j_k is the response of species i to treatment i and k, k and i take values of 0 or 1 depending on whether species receives the treatment. So, Y_i_j_k = mass_i + X_j + Y_k + ... And we are actually interested in analyzing response variables (involving ratios) that are themselves functions of the Y values themselves. I think that we should assume that the treatment effects themselves are to be modeled on the phylogeny, not the response variables. So, you might imagine that (under the null hypothesis) observed mass Y_1, 1, 1 is the sum of 3 random variables drawn from a multivariate normal distribution: species_mass, X, and Y. These 3 should have the same underlying variance-covariance structure, but different constants of proportionality, although Y_1_0_0 would only be drawn only from the 'species_mass' distribution. This has me stumped. Maybe I am making this too hard. ~Dan On Oct 5, 2009, at 1:34 PM, tgarl...@ucr.edu tgarl...@ucr.edu wrote: The idea of simulating the underlying raw data seems good to me, and should be pretty bullettproof. I'd be curious to know what, exactly, the ratios are representing. I think a lot about this for things like organ mass/body mass or leg length/body length, and am always interested to learn how others deal with these sorts of things. Feel free to go off line for explanation if you think it would bore this list. Cheers, Ted Theodore Garland, Jr., Ph.D. Professor Department of Biology University of California Riverside, CA 92521 Phone: (951) 827-3524 = Ted's office (with answering machine) Phone: (951) 827-5724 = Ted's lab Phone: (951) 827-5903 = Dept. office Home Phone: (951) 328-0820 FAX: (951) 827-4286 = Dept. office Email: tgarl...@ucr.edu http://biology.ucr.edu/people/faculty/Garland.html List of all publications with PDF files: http://biology.ucr.edu/people/faculty/Garland/GarlandPublications.html Garland, T., Jr., and M. R. Rose, eds. 2009. Experimental evolution: concepts, methods, and applications of selection experiments. University of California Press, Berkeley, California. http://www.ucpress.edu/books/pages/10604.php Associate Director Network for Experimental Research on Evolution http://nere.bio.uci.edu/ (A University of California Multicampus Research Project) Original message Date: Mon, 5 Oct 2009 16:23:24 -0400 (EDT) From: Daniel Lee Rabosky dl...@cornell.edu Subject: [R-sig-phylo] phylogenetic analysis of ratios To: r-sig-phylo@r-project.org Howdy- I have a two-part problem involving the phylogenetic analysis of ratios. Specifically, I'm interested in the correlation between two ratios X and Y. The problem is actually more complicated than this, because ratios X and Y depend in part on the same underlying data. My understanding is that this is not a simple dependency, eg with X = a/c and Y = b/c, in which case I think log-transforming X and Y would enable us to simply analyze the relationship between log(a) and log(b). [I'm asking on behalf of a collaborator, so I don't have the actual data in front of me]. One solution might involve parametrically simulating the underlying raw data a, b, c etc (independently) under fitted models of phenotypic evolution, then for each replicate calculating the ratios X and Y and using them to construct the null distribution of correlations. This would maintain the variance-covariance structure of the underlying data, and should control for any non-independence due to the use of shared underlying data in X and Y. Any thoughts? ~Dan Rabosky ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo [[alternative HTML version deleted]] ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
[R-sig-phylo] phylogenetic analysis of ratios
Howdy- I have a two-part problem involving the phylogenetic analysis of ratios. Specifically, I'm interested in the correlation between two ratios X and Y. The problem is actually more complicated than this, because ratios X and Y depend in part on the same underlying data. My understanding is that this is not a simple dependency, eg with X = a/c and Y = b/c, in which case I think log-transforming X and Y would enable us to simply analyze the relationship between log(a) and log(b). [I'm asking on behalf of a collaborator, so I don't have the actual data in front of me]. One solution might involve parametrically simulating the underlying raw data a, b, c etc (independently) under fitted models of phenotypic evolution, then for each replicate calculating the ratios X and Y and using them to construct the null distribution of correlations. This would maintain the variance-covariance structure of the underlying data, and should control for any non-independence due to the use of shared underlying data in X and Y. Any thoughts? ~Dan Rabosky [[alternative HTML version deleted]] ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
[R-sig-phylo] PHylogenetic analysis of ratios
Howdy- I have a two-part problem involving the phylogenetic analysis of ratios. Specifically, I'm interested in the correlation between two ratios X and Y. The problem is actually more complicated than this, because ratios X and Y depend in part on the same underlying data. My understanding is that this is not a simple dependency, eg with X = a/c and Y = b/c, in which case I think log-transforming X and Y would enable us to simply analyze the relationship between log(a) and log(b). [I'm asking on behalf of a collaborator, so I don't have the actual data in front of me]. One solution might involve parametrically simulating the underlying raw data a, b, c etc (independently) under fitted models of phenotypic evolution, then for each replicate calculating the ratios X and Y and using them to construct the null distribution of correlations. This would maintain the variance-covariance structure of the underlying data, and should control for any non-independence due to the use of shared underlying data in X and Y. Any thoughts? ~Dan Rabosky [[alternative HTML version deleted]] ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Re: [R-sig-phylo] age of ancestral node
Hi Manoel- It isn't entirely clear what you want to do. Do you want to, for each node in the list, (a) find the parent node, then (b) find the age of that parent node? The basic operation is straightforward: Given some node x and your tree: parent - tree$edge[,1][tree$edge[,2]==x] # gives the parent node of node x branching.times(tree)[as.character(parent)] # gives the divergence time of the parent node You can loop over your list using the above operations. ~Dan On Apr 14, 2009, at 6:19 PM, Manoel Silva wrote: Ok, here's my problem. I have a list of nodes in an ultrametric tree, and I'd like to generate a vector of the ages of its immediate ancestral nodes. I'm sure there is a simple way to do this, but I'm stuck. Any hints? Manoel [[alternative HTML version deleted]] ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Dan Rabosky Department of Ecology and Evolutionary Biology Fuller Evolutionary Biology Program Cornell Lab of Ornithology Cornell University [[alternative HTML version deleted]] ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo