[R-sig-phylo] Workshop: DiversificationRates & Macroevolution

2017-03-11 Thread Dan Rabosky
We will be offering a workshop on phylogenetic approaches for studying 
diversification rates to coincide with the North American Evolution meetings in 
June. Please distribute!

==

Short course on the analysis of diversification rates from phylogenies: June 
22-23 on the campus of Oregon State University, to coincide with the North 
American Evolution meetings (SSE/ASN/SSB) in Portland (June 23 - June 27). The 
workshop is funded in part by the National Science Foundation with additional 
support from Oregon State University and is co-organized by Dr. Dan Rabosky 
(University of Michigan) and Dr. Brian Sidlauskas (Oregon State University). 
Travel awards of up to $500 per person are available to cover participation 
costs.

Overview: Rates of speciation, extinction, and phenotypic evolution vary widely 
across the Tree of Life and through time. This workshop will provide 
theoretical background and a hands-on practicum in the analysis of lineage 
diversification rates using time-calibrated phylogenetic trees. Topics will 
include:

— Working with phylogenies in R
— Theoretical foundations of diversification models
— Developing your intuition for diversification models
— Using BAMM to study complex patterns of diversification rate variation on 
phylogenies
— Testing hypotheses about trait-dependent diversification  
— Assessing the reliability of inferences with BAMM and other methods
— Visualizing macroevolutionary dynamics on phylogenies
— Studying diversification rates on phylogenies that include fossils

Course will primarily be taught by Dan Rabosky (University of Michigan) with 
contributions from several co-instructors. The course will assume basic 
proficiency with the R programming/statistical environment and some familiarity 
with command line interfaces. Example datasets will be provided, but 
participants are encouraged to bring any phylogenetic dataset they wish to 
analyze (time calibrated phylogenetic trees). A personal laptop is essential.

Workshop participants will arrive in Corvallis (Oregon) any time on Wednesday 
June 21 and we will depart for Portland on the evening of Friday June 23, such 
that individuals can attend the Evolution meeting. Shuttles offer easy 
transport between the Portland Airport and OSU’s campus every two hours, and 
housing is available at several hotels near campus. Details regarding 
accommodation and transport will be provided to successful applicants.

To apply, please send a CV and a short statement (1-2 paragraphs) detailing 
your research interests, why you are interested in the course, and your prior 
experience with R and phylogenetics / comparative methods. Please email your 
application (or questions) to Dan Rabosky (macroevolution.works...@gmail.com). 
 
Applications will be accepted until April 2, 2016, but please apply early as 
spaces are limited. Target audience is graduate students and postdocs but 
applications from researchers at other career stages are welcome. Preference 
will be given to students with a clear interest and research focus in 
phylogenetics & macroevolution. 

_____
Dan Rabosky
Assistant Professor & Curator of Herpetology
Museum of Zoology &
Department of Ecology and Evolutionary Biology
University of Michigan
Ann Arbor, MI 48109-1079 USA

drabo...@umich.edu
http://www.raboskylab.org
http://www.lsa.umich.edu/ummz/



[[alternative HTML version deleted]]

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/

[R-sig-phylo] BAMM / BAMMtools reliability

2016-08-12 Thread Dan Rabosky
BAMM and BAMMtools users:

We have addressed a recently-published critique of BAMM / BAMMtools on the 
project website.

http://bamm-project.org/replication.html

Key results purporting to demonstrate poor statistical performance of BAMM on 
empirical datasets cannot be replicated under recommended settings for BAMM 
v2.5+. Most importantly, we strongly recommend that users do not modify hidden 
and undocumented settings for BAMM, nearly all of which are intended 
exclusively for BAMM developers. Modification of such settings may negatively 
impact the performance of the program on empirical datasets.

~Dan Rabosky

_
Dan Rabosky
Assistant Professor & Curator of Herpetology
Museum of Zoology &
Department of Ecology and Evolutionary Biology
University of Michigan
Ann Arbor, MI 48109-1079 USA

drabo...@umich.edu
http://www.raboskylab.org
http://www.lsa.umich.edu/ummz/



[[alternative HTML version deleted]]

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/


Re: [R-sig-phylo] getBtimes vs. branching.times

2016-01-12 Thread Dan Rabosky

Dear Lev-

I don't think branching.times can compute these for non-ultrametric trees.

You can do this with package BAMMtools, but you need a "hidden" internal 
function. You can access it as 

"BAMMtools:::NU.branching.times"
 
It returns branching times relative to the most recently-occurring tip in the 
tree. It's a R-based recursion that is a little slower that the ape function, 
so it's not recommended as a replacement for branching.times if you have an 
ultrametric tree.

I'm not actively maintaining laser, but getBtimes returns the output of 
branching.times after sorting the times and stripping out the node names (this 
was useful for something many years ago!). If you plot sort(getBtimes(x)) and 
sort(branching.times(x)) they should be identical.

~Dan Rabosky



On Jan 12, 2016, at 7:37 PM, Yampolsky, Lev <yampo...@mail.etsu.edu> wrote:

> Dear Colleagues,
> 
> Does anyone know what is the difference between ape�s  branching.times() and 
> laser�s getBtimes()?
> And why they may be giving rather different results, particularly for 
> internal branches? (From an ultrametric tree created by
> chronotree <- chronos(tree, lambda = 1, model = "correlated", quiet = FALSE, 
> calibration = makeChronosCalib(tree), control = chronos.control())
> 
> Thank you very much in advance for your help!
> 
> PS. A related but less important question: I am curious how does 
> branching.times() calculate branching times from a non-ultrametric tree?
> 
> --
> Lev Yampolsky
> 
> Professor
> Department of Biological Sciences
> East Tennessee State University
> Box 70703
> Johnson City TN 37614-1710
> Cell 423-676-7489
> Office/lab 423-439-4359
> Fax423-439-5958
> 
> ___
> R-sig-phylo mailing list - R-sig-phylo@r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
> Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/

_
Dan Rabosky
Assistant Professor & Curator of Herpetology
Museum of Zoology &
Department of Ecology and Evolutionary Biology
University of Michigan
Ann Arbor, MI 48109-1079 USA

drabo...@umich.edu
http://www-personal.umich.edu/~drabosky
http://www.lsa.umich.edu/ummz/



[[alternative HTML version deleted]]

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/

[R-sig-phylo] BAMM and BAMMtools package: updated, overhauled

2015-11-11 Thread Dan Rabosky

R-sig-phylo community:

We announce the release of major updates to BAMM and BAMMtools, a Bayesian 
framework for the analysis of speciation, extinction, and trait evolution on 
phylogenetic trees (www.bamm-project.org).  

If you are currently using BAMM/BAMMtools, we strongly recommend switching to 
BAMMtools v 2.1 (now on CRAN) and BAMM v 2.5 (or higher), now on the project 
website. There are significant changes to the software, including several 
important bug fixes. There are also some changes to the use of the software 
(e.g., BAMMtools now handles calculation of the prior, so there is no need to 
simulate it with BAMM).  

We have also added a comprehensive discussion of the likelihood function in 
BAMM and its assumptions at:

http://bamm-project.org/likelihoodmodel.html

As we discuss, there are several important assumptions and unresolved issues 
involving the calculation of likelihoods for "rate shift" models. These issues 
are common to all methods that purport to compute the likelihood of 
phylogenetic trees with rate shifts and non-zero extinction rates and are not 
limited to BAMM. Several issues lack a clear theoretical resolution and we've 
flagged them prominently to hopefully inspire more attention from the 
community. 

The new release of BAMMtools includes an R-based likelihood calculator that 
computes the likelihood of diversification shifts on phylogenetic trees exactly 
as implemented in BAMM to enable users to compare BAMM likelihoods to those 
from other software implementations. We provide examples on the website showing 
how this function can be used to test whether the BAMM likelihood function is 
implemented correctly 
(http://bamm-project.org/likelihoodmodel.html#testlikelihood). If you find any 
areas of parameter space where this function is poorly behaved, please let us 
know.
 
We hope that this tool and associated documentation increases transparency of 
the likelihood calculations themselves as implemented in BAMM. 
 
 ~Dan Rabosky 


_____
Dan Rabosky
Assistant Professor & Curator of Herpetology
Museum of Zoology &
Department of Ecology and Evolutionary Biology
University of Michigan
Ann Arbor, MI 48109-1079 USA

drabo...@umich.edu
http://www-personal.umich.edu/~drabosky
http://www.lsa.umich.edu/ummz/



[[alternative HTML version deleted]]

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/


Re: [R-sig-phylo] seemingly conflicting output from BAMM

2014-08-14 Thread Dan Rabosky

Hi Chris-

Just to add to what Jonathan wrote.

This is a good question. You have two basic issues that are being confounded: 
(1) how much evidence is there for a rate shift overall, versus (2) how much 
evidence do you have bearing on the locations of specific shifts. In your case, 
you have limited evidence for rate variation: Bayes factors of 3 -5 versus a 
model with 0 shifts. That's rather weak evidence for rate variation, but it's 
(in my opinion) at least worth considering further. You can also see that this 
is not especially strong evidence from considering the posterior distribution 
of shifts, which gives a posterior probability of 0.18 to a model with 0 shifts 
(as an aside, Bayes factors are always going to be more reliable for these 
types of comparisons because they explicitly take into consideration whatever 
prior distribution you've specified on the number of shifts).

In your case, you have weak overall evidence for a shift somewhere in your 
data. However, you have even less evidence that a shift occurs at any 
particular location. This is what you see in your credible shift set: each 
shift configuration has a shift in a different location. The overall best 
shift configuration, e.g., the one with the maximum a posteriori (MAP) 
probability, does not have any core shifts. This is a bit confusing, but is 
explained at length here: http://bamm-project.org/rateshifts.html

Basically, we only consider significant shifts when we enumerate the set of 
distinct shift configurations in the posterior, where significant means that 
a shift occurs on a given branch with substantially elevated frequency relative 
to what you'd expect under the prior (again, this is all explained on the 
documentation page). These are termed core shifts in BAMM terminology. In 
your case, the shift configuration with the highest posterior probability 
actually has no core shifts.

However, you can decrease the threshold used to identify core shifts. You can 
lower the Bayes factor criterion used to identify shifts with elevated 
posterior probabilities relative to the prior expectation in the 
credibleShiftSet function by changing the default value for the BFcriterion 
argument (e.g., if you set BFcriterion = 1, you will probably see a shift in 
your MAP probability shift configuration). 

However, tweaking the BFcriterion argument won't change the fact that your 
dataset has only weak evidence overall for rate heterogeneity among clades. As 
you decrease the BFcriterion, you will find that the posterior probability of 
your MAP shift configuration will also drop. Jonathan makes a good point, 
especially relevant in this case, because the credible shift set here is 
telling you something important that you won't get out of a single point 
estimate: you don't have much confidence at all in any particular rate shift.

~Dan Rabosky 




On Aug 14, 2014, at 8:28 PM, Jonathan Chang wrote:

 Hi Krzysztof,
 
 It certainly can be true that the most credible shift configuration is
 one where there are no inferred rate shifts, but also prefer a model
 with one rate shift. The issue is mentioned in the BAMM documentation
 http://bamm-project.org/rateshifts.html
 
 BAMM looks like it found evidence of a rate shift on your phylogeny.
 However, the exact location of that rate shift is not certain. In your
 plots, shift configuration #2 shows a rate increase on the upper
 clade, whereas #3 shows a rate decrease in the lower clade. #4 and #5
 tell similar stories. Note that the configurations with 1 rate shift
 (#2-#5) combined are seen more often than configurations with 0 rate
 shifts (#1).
 
 Personally I'm unclear on how useful the most credible shift
 configuration actually is. To me that throws away a lot of the power
 of BAMM by reducing its inference down to a point estimate.
 
 Jonathan
 
 On Thu, Aug 14, 2014 at 5:08 PM, Krzysztof Kozak kk...@cam.ac.uk wrote:
 Dear All,
 
 I have been asked to analyse my chronogram using BAMM, and I like the idea.
 Sadly, I am puzzled by the output. I worked through the example and read the
 entire documentation, but still don't grasp why different analyses suggest
 different answers.
 
 1. On one hand, several functions suggest that there are 1-2 rate shifts in
 my data.
 - Plotting netdiv rate shows it changing somewhat at two times.
 - plot.bammdata(edata) shows increased rate on the branch leading to a
 disproportionately large clade
 - rescaling the branch lengths by the Bayes Factor of a rate shift
 (bayesFactorBranches) also shows that branches leading to more speciose
 clades are very long
 - computeBayesFactors gives this output:
 0 1.000 0.2860509 0.2273844 0.3127353 0.3841264 1.091439 0.3605840
 1 3.4958818 1.000 0.7949089 1.0932856 1.3428605 3.815542 1.2605592
 2 4.3978396 1.2580058 1.000 1.3753596 1.6893262 4.799974 1.5857908
 3 3.1975924 0.9146741 0.7270825 1.000 1.2282796 3.489977 1.1530008
 4 2.6033098 0.7446790 0.5919520 0.8141469 1.000 2.841354

Re: [R-sig-phylo] autocorrelation nodes in phylogeny

2014-06-09 Thread Dan Rabosky

Hi Renske-

This is a complex question and it would take a manuscript to answer it 
properly. The short answer is no, not without doing a lot of work to show that 
whatever error model you use leads to acceptable Type I error rates under 
various models of trait evolution. It is almost certainly that case that any 
off the shelf model will have elevated Type I error rates when applied to BAMM 
rate estimates. 

We will soon release a method that does exactly what you would like to do, so 
stay tuned on this topic.

Whether you have power will depend on the number of distinct macroevolutionary 
rate regimes in your data. If only several, you won't have much power - you can 
think about the number of rate regimes in your data as (very) loosely 
equivalent to the effective number of data points you have to test hypotheses 
about traits and diversification in this framework. If you have evidence for 2 
- 5 rate shifts, you should not expect to have much power to detect 
trait-dependent diversification.

~Dan Rabosky



On Jun 9, 2014, at 9:29 AM, Renske Onstein wrote:

 Dear all,
 
 
 I would like to test the effect of a set of (binary+continuous) traits
 simultaneously on diversification rates using linear models. I have
 diversification rates linked to each node and tip in a phylogeny (resulting
 from BAMM), and I have probabilities of the traits at each node and tip.
 However, I want to correct for the phylogenetic dependence of both rates
 and traits. Functions in R using PGLS seem to build a variance co-variance
 matrix based on data at the tips of the trees, but it seems not possible to
 assign data to the nodes of the tree.
 
 Does anyone know how to do this, or another way in which one can correct
 for the autocorrelation between nodes in a phylogeny when using linear
 models?
 
 
 Thanks a lot,
 
 Renske
 
 -- 
 Renske Onstein
 PhD student
 University of Zurich
 Institute of Systematic Botany
 Zollikerstrasse 107
 8008 Zurich
 Switzerland
 onstei...@gmail.com renske.onst...@systbot.uzh.ch
 
   [[alternative HTML version deleted]]
 
 ___
 R-sig-phylo mailing list - R-sig-phylo@r-project.org
 https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
 Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/

_
Dan Rabosky
Assistant Professor  Curator of Herpetology
Museum of Zoology 
Department of Ecology and Evolutionary Biology
University of Michigan
Ann Arbor, MI 48109-1079 USA

drabo...@umich.edu
http://www-personal.umich.edu/~drabosky
http://www.lsa.umich.edu/ummz/



[[alternative HTML version deleted]]

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/


Re: [R-sig-phylo] autocorrelation nodes in phylogeny

2014-06-09 Thread Dan Rabosky

As a quick follow up, I would add that it is not necessarily the case that your 
Type I error rates will be inflated if you use (say) a phylogenetic error 
model. It depends on your sampling. If you run BAMM on a phylogenetic tree of X 
species, then test a trait-diversification correlation on a greatly reduced 
dataset (say, 0.1 * X species) using those BAMM rate estimates, you will often 
find that error rates are not appreciably elevated. In effect, subsampling your 
datasets like this (using the subtreeBAMM function) can greatly decrease the 
mean covariance in rates between species that is attributable to the BAMM model 
itself and lead to acceptable statistical properties.

We've done this and found that it behaved acceptably for our purposes, but this 
is a bit of an ad hoc solution and you will need to assess whether it's 
appropriate for your data.

~Dan Rabosky




On Jun 9, 2014, at 9:29 AM, Renske Onstein wrote:

 Dear all,
 
 
 I would like to test the effect of a set of (binary+continuous) traits
 simultaneously on diversification rates using linear models. I have
 diversification rates linked to each node and tip in a phylogeny (resulting
 from BAMM), and I have probabilities of the traits at each node and tip.
 However, I want to correct for the phylogenetic dependence of both rates
 and traits. Functions in R using PGLS seem to build a variance co-variance
 matrix based on data at the tips of the trees, but it seems not possible to
 assign data to the nodes of the tree.
 
 Does anyone know how to do this, or another way in which one can correct
 for the autocorrelation between nodes in a phylogeny when using linear
 models?
 
 
 Thanks a lot,
 
 Renske
 
 -- 
 Renske Onstein
 PhD student
 University of Zurich
 Institute of Systematic Botany
 Zollikerstrasse 107
 8008 Zurich
 Switzerland
 onstei...@gmail.com renske.onst...@systbot.uzh.ch
 
   [[alternative HTML version deleted]]
 
 ___
 R-sig-phylo mailing list - R-sig-phylo@r-project.org
 https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
 Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/

_
Dan Rabosky
Assistant Professor  Curator of Herpetology
Museum of Zoology 
Department of Ecology and Evolutionary Biology
University of Michigan
Ann Arbor, MI 48109-1079 USA

drabo...@umich.edu
http://www-personal.umich.edu/~drabosky
http://www.lsa.umich.edu/ummz/



[[alternative HTML version deleted]]

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/


[R-sig-phylo] BAMM: Bayesian analyses of macroevolutionary rates

2014-03-07 Thread Dan Rabosky

R-sig-phylo community:

We announce the release of BAMM and BAMMtools, a Bayesian framework for the 
analysis of speciation, extinction, and trait evolution on phylogenetic trees 
(www.bamm-project.org). BAMM is oriented entirely towards detecting and 
quantifying heterogeneity in evolutionary rates. It uses reversible jump Markov 
chain Monte Carlo to automatically explore a large number of candidate models 
of lineage diversification and trait evolution. BAMM is a command line program 
written in C++. 

We have also created BAMMtools, an R package for the analysis and visualization 
of BAMM output. The package is available for installation via CRAN or through 
the project website. 

BAMM / BAMMtools functionality includes a number of novel methods for 
visualization and analysis of complex macroevolutionary dynamics, including:

- Inference on evolutionary rate variation through time and among clades
- Visualization of dynamic rates on phylogenetic trees with BAMMtools
- Estimation of credible sets of macroevolutionary rate configurations
- Estimation of the best (maximum a posteriori probability) macroevolutionary 
rate configuration
- Bayes factors for inferring numbers of evolutionary rate regimes on 
phylogenies
- Summarize marginal distributions of evolutionary rates for individual clades
- Analyze diversification rates with incomplete and phylogenetically non-random 
taxon sampling

The documentation includes a graph gallery illustrating some of data 
visualizations and analyses that can be done out of the box with BAMMtools 
(www.bamm-project.org/bammgraph.html). A conceptual discussion how the BAMM 
framework differs from stepwise AIC-based approaches can be found at 
http://bamm-project.org/rateshifts.html. 

A complete description of the general model that underlies BAMM (and a 
performance evaluation) is available at:

** Rabosky, DL. 2014. Automatic detection of key innovations, rate shifts, and 
diversity-dependence on phylogenetic trees. PLoS ONE. 
http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0089543

*** If you previously installed BAMM or BAMMtools, please download the latest 
versions. The code for both has changed during the past month and we cannot 
guarantee compatibility. 

~Dan Rabosky

_

Dan Rabosky
Assistant Professor  Curator of Herpetology
Museum of Zoology 
Department of Ecology and Evolutionary Biology
University of Michigan
Ann Arbor, MI 48109-1079 USA

drabo...@umich.edu
http://www-personal.umich.edu/~drabosky
http://www.lsa.umich.edu/ummz/

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/


Re: [R-sig-phylo] PGLS from taxonomy and computational time

2013-11-11 Thread Dan Rabosky

Hi Matthew-

Rob Freckleton had a short paper last year demonstrating the calculations for 
PGLS (using Felsenstein's 1973 algorithm) without the need for numerically 
challenging (perhaps impossible in this case) matrix inversion for a dataset of 
this size. 

http://onlinelibrary.wiley.com/doi/10./j.2041-210X.2012.00220.x/abstract

Some example code is provided in the appendix (this might even be implemented 
in Caper or a similar package now). My guess is that this will solve your 
problems (see Fig 1 from the paper, for example).

~Dan Rabosky



On Nov 11, 2013, at 4:16 PM, Matthew Leo Knope wrote:

 Dear all,
 
 I am attempting to run a PGLS in R (please see code below) where I am 
 accounting for phylogenetic affinity (via taxonomy) to the class level and 
 then testing for the potential relationship between 3 basic ecological axes 
 (habitat tiering, motility level, and feeding mode) on body size in ~17K 
 genera of marine fossil bilaterian animals (including inverts, fishes, 
 reptiles, and mammals).
 
 However, the computational time for calculating the PGLS is prohibitive 
 (crashes my computer every time and is now running 6 days on a high 
 performance cluster and yet to finish).  Is there a way to speed this up 
 somehow or perhaps a better way to do this altogether?
 
 Many thanks in advance,
 
 Matthew Knope
 
 all-read.csv(pgls_all.csv)
 
 library(nlme)
 library(MuMIn)
 library(arm)
 library(ape)
 
 #log transform
 loglength-log(all$max_length)
 
 #convert numeric to factors
 tier-factor(all$tiering)
 mot-factor(all$motility)
 feed-factor(all$feeding)
 
 #to build tree from taxonomy
 alltree=as.phylo(~phylum/class, data=all)  
 
 #gives the tree random branch lengths (will try a bunch to see how it impacts 
 the results)
 alltreerand - compute.brlen(alltree,method=Grafen) 
 
 #first compute correlation matrix from tree
 cormatrix-corPagel (1,alltreerand,fixed=FALSE)
 
 #to run pgls with tree and random branch lengths
 pgls_length- gls(loglength~tier+mot+feed, data=all, correlation=cormatrix, 
 method=ML)
 summary(pgls_length)
 confint(pgls_length)
   [[alternative HTML version deleted]]
 
 ___
 R-sig-phylo mailing list - R-sig-phylo@r-project.org
 https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
 Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/

_
Dan Rabosky
Assistant Professor  Curator of Herpetology
Museum of Zoology 
Department of Ecology and Evolutionary Biology
University of Michigan
Ann Arbor, MI 48109-1079 USA

drabo...@umich.edu
http://www-personal.umich.edu/~drabosky
http://www.lsa.umich.edu/ummz/



[[alternative HTML version deleted]]

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/


Re: [R-sig-phylo] Substitute for functions on the laser package

2013-09-04 Thread Dan Rabosky

Hi all-

Thanks for calling these issues to my attention. An updated version of laser 
should be on cran within the next few days (I just uploaded the package). 

Many thanks in particular to Klaus Schliep for fixing a number of dependency 
issues and for improving several of the functions. Klaus, I apologize for 
missing your original email that included your fixes (I was in the field 
without email for an extended period of time earlier this year and missed a 
number of messages that arrived during that time).

A comprehensive revision of the package is underway, but the version I just 
uploaded largely maintains the functionality of the package as it existed 
before being kicked off of CRAN a few months ago.

~Dan Rabosky





On Sep 4, 2013, at 6:20 AM, Emmanuel Paradis wrote:

 Hi Mariana,
 
 You may try the function bd.time in ape. The companion paper gives examples 
 similar to what laser does.
 
 Best,
 
 Emmanuel
 -Original Message-
 From: Mariana Vasconcellos marian...@utexas.edu
 Sender: r-sig-phylo-boun...@r-project.org
 Date: Tue, 3 Sep 2013 23:20:17 
 To: Liam J. Revellliam.rev...@umb.edu
 Cc: r-sig-phylo@r-project.org
 Subject: Re: [R-sig-phylo] Substitute for functions on the laser package
 
 Thanks, Liam! But, unfortunately I don't know how to build from source. I 
 downloaded Xcode and the laser_2.3.tar.gz. But, I have no idea of how to 
 install from source. If someone could help with any advise, that would be 
 great!  Also, does anyone have another option to calculate decreasing 
 speciation rate, increasing extinction rate or both using a different package?
 
 Thank you very much!
 
 --
 Mariana Vasconcellos
 Ecology, Evolution  Behavior
 Integrative Biology
 The University of Texas at Austin
 
 
 
 
 On Sep 3, 2013, at 10:05 PM, Liam J. Revell liam.rev...@umb.edu wrote:
 
 Hi Mariana.
 
 You can download old versions of laser from the CRAN archive 
 (http://cran.r-project.org/src/contrib/Archive/laser/); however you will 
 have to build from source. This is easy if you have Xcode (for Mac OS) or a 
 gcc compiler installed. If you do not, and cannot figure out how to install 
 from source, then respond to the list  I'm sure someone will help out  
 post a package binary for you.
 
 - Liam
 
 Liam J. Revell, Assistant Professor of Biology
 University of Massachusetts Boston
 web: http://faculty.umb.edu/liam.revell/
 email: liam.rev...@umb.edu
 blog: http://blog.phytools.org
 
 On 9/3/2013 10:39 PM, Mariana Vasconcellos wrote:
 Dear all:
 I just performed a test of the gamma-statistic on my tree and I found that 
 diversification is not constant in time. So, I would like to perform the 
 function fitSPVAR, fitEXVAR and fitBOTHVAR using the laser package. But, I 
 just saw that the laser package was removed from the CRAN repository. Could 
 anyone tell me what other alternative package I could use to develop and 
 test models of decreasing speciation rate, increasing extinction rate or 
 both? Does MEDUSA do this sort of analyses?
 
 Thank you for your help!
 Mariana
 
 
 
 
 
 
 [[alternative HTML version deleted]]
 
 ___
 R-sig-phylo mailing list - R-sig-phylo@r-project.org
 https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
 Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/
 
 
 
   [[alternative HTML version deleted]]
 
 ___
 R-sig-phylo mailing list - R-sig-phylo@r-project.org
 https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
 Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/
 ___
 R-sig-phylo mailing list - R-sig-phylo@r-project.org
 https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
 Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/

_
Dan Rabosky
Assistant Professor  Curator of Herpetology
Museum of Zoology 
Department of Ecology and Evolutionary Biology
University of Michigan
Ann Arbor, MI 48109-1079 USA

drabo...@umich.edu
http://www-personal.umich.edu/~drabosky
http://www.lsa.umich.edu/ummz/



[[alternative HTML version deleted]]

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/


Re: [R-sig-phylo] asymmetric transitions

2012-08-17 Thread Dan Rabosky
. Allman and J. A. Rhodes, “The Identifiability of Covarion Models 
 in Phylogenetics,” IEEE/ACM Transactions on Computational Biology and 
 Bioinformatics, vol. 6, no. 1, pp. 76–88, Jan. 2009.
 
 
 
 
 On Aug 16, 2012, at 10:07 PM, Matt Pennell wrote:
 
 correction: the last sentence should have read
 
 I wonder how that would work in this case. I think these are important 
 questions going forward.
 
 On Thu, Aug 16, 2012 at 11:00 PM, Matt Pennell mwpenn...@gmail.com wrote:
 Hey all,
 
 This has been a really fantastic discussion. Mark, you make some really 
 excellent points in response to my earlier comments. I think you are 
 correct in this.
 
 The question that arises out of Jarrod and Dan's simulations (which I have 
 just run) is whether a model selection criteria would be able to 
 distinguish MK from the threshold model that Felsenstein (and Wright before 
 him) put forth? And how do we best assess model adequacy? Carl Boettiger 
 and company (2012: Evolution) suggested a Phylogenetic Monte Carlo approach 
 for continuous characters. I wonder how that would before  I think these 
 are important questions going forward.
 
 thanks again,
 matt
 
 
 
 On Thu, Aug 16, 2012 at 10:43 PM, Dan Rabosky drabo...@umich.edu wrote:
 
 Hi all-
 
 A couple of points. I am actually less concerned about the Type I error 
 rates I gave in that previous message for the equal rates markov process, 
 even though I think they are real (e.g., I can corroborate them using 
 Diversitree). I don't think it is an issue of ascertainment bias, but I 
 think Mark may be right about the LRT being inappropriate with few events 
 on the tree and this may well explain the matter. This is probably worth 
 exploring further.
 
 However, I am much more concerned about Jarrod's second model (with 
 underlying continuous latent variable). This seems to be a serious problem, 
 and if you simulate under the latent model, I think he is right that Type I 
 error rates are really, really high. The model is reasonable: there is a 
 continuous trait that influences the probability of observing a particular 
 tip state. In a practical sense, this probably means the following: (i) 
 some clades in the tree will essentially be fixed for the character in 
 question. (ii) Other clades will appear to have high lability of the 
 character. The clades that are fixed will be those clades where the 
 underlying threshold character (e.g., mean clade value) drifts towards -Inf 
 (or +Inf). Regardless of whether we think about this latent model, this at 
 least leads to an interesting - and probably quite relevant - form of model 
 misspecification. The model essentially induces some extra heterogeneity in 
 rates, such that some clades will appear to be switching quickly and others 
 slowly. However, it is still a symmetric model of sorts.
 
 You can simulate data easily under this model and verify (using whatever 
 software) that it is a problem. I'm attaching code that does this. You can 
 play around with 3 parameters: (i) the number of taxa in the analysis (set 
 to 100); (ii) the expected variance of the continuous latency factor (from 
 roots to tip); and (iii) the root state. These parameters are NTAXA, 
 tipvar, and root in the code below.
 
 I'm keen to see what others think, but it looks to me like you can simulate 
 very reasonable-looking datasets and obtain extremely strong support for an 
 asymmetric model - even though the model is quasi-symmetric. So, if these 
 hold, then I think this is a serious issue - nothing we routinely do in the 
 analysis of discrete characters is designed to detect this sort of model 
 misspecification.
 
 ## A single simulation
 
 library(diversitree);
 library(geiger);
 library(mvtnorm);
 
 NTAXA - 101;
 
 # Generate the tree:
 x - birthdeath.tree(b=1, d=0, taxa.stop=NTAXA);
 x - drop.tip(x, x$tip.label[x$edge[,2][x$edge.length==0][1]]);
 
 
 vv - vcv.phylo(x); # get phylogenetic vcv matrix
 
 # Now we set the expected variance at the tips:
 #   e.g., the value we want for the diagonal of the vcv matrix
 #   If this is = 1, you'll have a phylogenetic standard normal 
 distribution
 #
 
 tipvar - 2;
 sf - tipvar/max(vv); #get scale factor for vcv matrix
 
 vmat - vv*sf; # scale matrix
 root - 0; #root state: this assumes the root is equally likely to give 
 either state
 
 mu - rep(root, nrow(vmat));  #vector of means
 
 
 # Simulate continuous, and then discrete, chars from
 #   the corresponding mvn and binomial distributions
 chars - rmvnorm(1, mean=mu, sigma=vmat);
 states - rbinom(length(chars), 1, prob=plogis(chars));
 names(states) - x$tip.label;
 
 # Look at the data...
 plot(x, show.tip.label=F);
 tiplabels(pch=21, bg = c('black', 'white')[(states+1)], col='black', cex=1);
 
  Using Diverstree for model fitting:
 lfx - make.mk2(x, states); # The asymmetric likelihood function
 lfxcon - constrain(lfx, formulae = list(q01 ~ q10)); #constraining q01 ~ 
 q10
 
 # Estimation...
 l2

Re: [R-sig-phylo] asymmetric transitions

2012-08-16 Thread Dan Rabosky

HI Jarrod-

It isn't immediately obvious to me why the exercise below reflects something 
problematic. In the first scenario, you have a random binary state but with 
strong differences in frequency. Because there is effectively no phylogenetic 
signal (as data are simply random), this suggests a high rate of transition 
between states. I think that such asymmetry in frequencies would be highly 
unlikely under a symmetric model of character change regardless of the root 
state. I think it is useful to think about whether a symmetric process could 
have given rise to a truly random distribution of tip data with strong 
frequency differences - my intuition is that this is highly unlikely. However, 
I would be happy to know what others think.

Cheers,
~Dan


On Aug 16, 2012, at 10:09 AM, Jarrod Hadfield wrote:

 Hi,
 
 I have had a few replies off-list which have made me try and clarify what I 
 mean.  I think the distinction needs to be made between two types of 
 probability: the probability  that an outcome is 0 or 1 Pr(y| \theta) and the 
 probability density of \theta, Pr(\theta | \gamma), indexed by parameter(s) 
 \gamma.  It seems to me that in order to make the problem identifiable an 
 informative prior (or an assumption) is required for the root state.  It 
 seems to me that the strong prior Pr(\theta=0.5|\gamma) =1 is used, and then 
 justified because int Pr(y=0| \theta)Pr(theta)dtheta=int Pr(y=1| 
 \theta)Pr(theta)dtheta=0.5. A non-informative prior distribution for \theta 
 could be U(0,1). This also induces a prior on y of the same form, int Pr(y=0| 
 \theta)Pr(theta)dtheta=int Pr(y=1| \theta)Pr(theta)dtheta=0.5, but that is 
 not the main motivation for choosing U(0,1).
 
 For example, is this not worrying:
 
 library(ape)
 n-100
 tree-rcoal(n) # random tree
 y-rbinom(n, 1, 0.2)  # random data unconnected to the tree
 m1-ace(y, tree, type = d, model=SYM)
 m2-ace(y, tree, type = d, model=ARD)
 anova(m1, m2) # asymmetric evolutionary transition rates strongly 
 supported
 
 y-rbinom(n, 1, 0.5)  # random data unconnected to the tree but p=0.5
 m1-ace(y, tree, type = d, model=SYM)
 m2-ace(y, tree, type = d, model=ARD)
 anova(m1, m2) # asymmetric evolutionary transition not supported
 
 Cheers,
 
 Jarrod
 
 
 
 
 
 
 Quoting Jarrod Hadfield j.hadfi...@ed.ac.uk on Thu, 16 Aug 2012 12:30:30 
 +0100:
 
 Hi,
 
 I have been helping someone with some analyses and came across some routines 
 to estimate asymmetric transition rates between discrete characters. This 
 surprised me because its fairly straightforward to prove that asymmetric 
 transition rates cannot be identified using data collected on the tips of a 
 phylogeny. However when I run these routines (e.g. ace) likelihood ratio 
 tests often suggest strong support for asymmetric rates. This seems to arise 
 because there is an implicit (and unjustified) prior point mass on the 
 ancestral state *probability*. If anyone could point me to reference that 
 states this that would be great? I really don't want to be saying we have 
 support for processes that we need a fossil record, not just a phylogeny, to 
 understand.
 
 Cheers,
 
 Jarrod
 
 
 
 -- 
 The University of Edinburgh is a charitable body, registered in
 Scotland, with registration number SC005336.
 
 ___
 R-sig-phylo mailing list
 R-sig-phylo@r-project.org
 https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
 
 
 
 
 
 -- 
 The University of Edinburgh is a charitable body, registered in
 Scotland, with registration number SC005336.
 
 ___
 R-sig-phylo mailing list
 R-sig-phylo@r-project.org
 https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
 
 

 


[[alternative HTML version deleted]]

___
R-sig-phylo mailing list
R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo


Re: [R-sig-phylo] asymmetric transitions

2012-08-16 Thread Dan Rabosky

Hi All-

This is an interesting discussion. I think there is clearly something going on. 
I do not get catastrophic Type I error rates from this exercise (only 
'elevated') with discrete char simulations (an equal rates markov process) - 
see below. However, Jarrod's latent model seems reasonable and probably a 
decent approximation of many real phenomena, so it is extremely worrying that 
Type I error rates would be so high. 

Also, I don't see why the BiSSE framework would be immune to this problem. If 
the true process of character change (regardless of its relationship to 
speciation and extinction) occurs with a latent/liability model, it seems that 
BiSSE would also be affected.  

This is something to think (and worry) about.

#Simulations using pure birth tree with 100 tips; birthrate = 1.0. 
# pure birth used as these branch lengths are more consistent with those in a 
typical interspecific dataset

#Rate = 0.01 
Mean frequency of root state in pure birth tree of 100 tips: 0.930
Type I error rate: 0.12
 
# Rate = 0.05.
Frequency of root state: 0.86
Type I error rate: 0.08

# Rate = 0.1.
Frequency of root state: 0.77
Type I error rate: 0.08

#Rate = 1.0
Frequency of root state: 0.501
Type I error rate: 0.03

#Rate = 10
Frequency of root state: 0.49
Type I error rate: 0.04

 

#Simulation code

 library(geiger);

# quick fix to deal with tree simulations in Geiger, eg birthdeath.tree
#   stops at exactly N taxa, but this gives a pair of 0 length branches:

x - birthdeath.tree(b=1, d=0, taxa.stop=101);
x - drop.tip(x, x$tip.label[x$edge[,2][x$edge.length==0][1]]);

# We drop one of the tips with 0 length branches...
 

## Set up rate matrix:
rate1 -1;
qmat - list(rbind(c(-rate1, rate1), c(rate1, -rate1))); #FAST rate matrix

## PLot a tree with character states, just to get a feel for phylogenetic 
##  signal in the character state data:

sim -sim.char(x, qmat, model=discrete, n=1, root.state=1);
cvec - c('black', 'white');
plot.phylo(x, show.tip.label=F);
tiplabels(pch=21, col='black', bg=cvec[sim], cex=0.9); 
 
NSIMS - 100;
fdif - numeric(NSIMS); # vector for character frequencies
pvec - numeric(NSIMS); # vector for pvalues

for (i in 1:NSIMS){
cat(i, '\n');
sim -sim.char(x, qmat, model=discrete, n=1, root.state=1);

# make sure at least both states are observed in dataset:
while (sum(sim==1) == length(sim)){
sim -sim.char(x, qmat, model=discrete, n=1, root.state=1);
}

tx -table(sim);
fdif[i] - tx['1'] / sum(tx);

m1 - ace(sim, x, type = 'd', model ='SYM');
m2 - ace(sim, x, type = 'd', model = 'ARD');
lrt - -2*m1$loglik + 2*m2$loglik;
pvec[i] - 1 - pchisq(lrt, 1)

}

mean(fdif);
sum(pvec = 0.05);



On Aug 16, 2012, at 11:04 AM, Matt Pennell wrote:

 Jarrod and Dan,
 
 While I see what Dan is saying and I agree that evaluating this with 
 non-phylogenetic data is not entirely useful, I think you have stumbled upon 
 a known issue but one that is not widely appreciated.
 
 While the MK model is a useful model for discrete characters in many ways, it 
 may be inappropriate for such a test. If you assume that the root is equally 
 likely to be in either state with a lot of tips (i.e. approaching the limit), 
 the ML estimate for the ratio of q01 (transition rate from 0 to 1) to q10 
 will converge on the ratio between number of tips in state 1 to number of 
 tips in state 0. So if you have a tree where most of the tips are in state 1 
 then you will get support for the asymmetric change model as this best 
 explains the data. It is a very weird problem and perhaps I am incorrect in 
 this. 
 
 Regardless, this result does not hold if the discrete character is modeled 
 simulataneously with the branching process of the phylogeny (i.e. BiSSE; 
 Maddison et al. 2007).
 
 So, in summation, I mostly agree with you. But this is not shocking behavior 
 of the model. If you are interested, a Bayesian implementation of the Mk 
 model can be found in the package diversitree in the function make.mk.
 
 again, i could be off base here. curious to see what others think?
 
 matt
 
 On Thu, Aug 16, 2012 at 10:58 AM, Dan Rabosky drabo...@umich.edu wrote:
 
 HI Jarrod-
 
 It isn't immediately obvious to me why the exercise below reflects something 
 problematic. In the first scenario, you have a random binary state but with 
 strong differences in frequency. Because there is effectively no phylogenetic 
 signal (as data are simply random), this suggests a high rate of transition 
 between states. I think that such asymmetry in frequencies would be highly 
 unlikely under a symmetric model of character change regardless of the root 
 state. I think it is useful to think about whether a symmetric process could 
 have given rise to a truly random distribution of tip data with strong 
 frequency differences - my intuition is that this is highly unlikely. 
 However, I would

Re: [R-sig-phylo] asymmetric transitions

2012-08-16 Thread Dan Rabosky
:50 to favor the ARD model over the Mk model. In fact, the simulation 
 model is equivalent the ARD model with a rate of evolution approaching 
 infinity, so we should prefer the ARD model.
 
   3. (in reference to Matt's post) In the ARD model, it is possible for 
 the MLE of the equilibrium state frequency of the less frequent state to be 
 0.5. Presumably, this is a rare occurrence, but I don't agree with the 
 characterization of the ratio of parameters converging to the ratio of 
 states. 
 
   Consider a clade with lots of tips in state 1 but tiny branch lengths. 
 If this clade is found in the context of a tree with a few other branches 
 that are long and lead to tips with state 0, then you can get an MLE of the 
 state frequency for state 0 being  0.5. Most of the tips will have state 1, 
 but because they are easily explained by one transition to 1 you can still 
 infer that the less frequent state has a higher equilibrium frequency.
 
   Perhaps, I'm mis-reading what Matt is referring to when he discusses an 
 analysis of a tree with a lot of tips (ie. approaching the limit). I do 
 agree that if you simulate a very large tree under ARD (with the frequencies 
 not equal to 0.5), then the frequency of the states at the tips will converge 
 to the equilibrium state frequencies. 
   
 
 
 With respect to Dan's results:
 
 The Type I error rate of 0.12 troubles me.  Have you tried exporting the data 
 and seeing if other software agrees with the likelihood ratios returned by 
 ace() ?  I looked at the code for ace and nothing looked amiss to me (though 
 my R skills are virtually non-existent).
 
 If the result is corroborated by other software, then my best guesses would 
 be:
   1. ascertainment bias (the simulation model clearly excludes constant 
 patterns, but I don't believe that the inference model in ace does any 
 correction for this), and/or
   2. the assumption that you can use the chi-square as the null 
 distribution for the LRT probably breaks down when you have very few events 
 on the tree. In some sense the number of events is our measure of the amount 
 of data, and when we have very few events on the tree the asymptotic behavior 
 of the LRT under the null is probably not going to help us. 
 
 In the limiting case, when rates of character change are so low that you 
 never see homoplasy, then I think the LRT of the the two models should get 
 close to 1 on virtually any realization (conditional on starting in state 1 
 and having exactly 1 change on the tree, both model make the same predictions 
 about the data; so in this realm the data should not prefer one model over 
 the other).  So, I'm not sure how the small data explanation would explain 
 your observation of an excess of large LRT statistics.
 
 
 those are my 2 cents.
 
 all the best,
 Mark
 
 

_
Dan Rabosky
Assistant Professor
Dept of Ecology and Evolutionary Biology
 Museum of Zoology
University of Michigan
drabo...@umich.edu

___
R-sig-phylo mailing list
R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo


Re: [R-sig-phylo] read.dna warnings and pitfalls

2012-04-26 Thread Dan Rabosky

Hi Emmanuel-

Thanks for fixing the whitespace issue. I think this fix will be useful to many 
users. 

On the issue of recognizing 10 IUPAC characters: I think this is a real 
problem, and may come up again in short order. Maybe it is just that use of 
this function has been limited? In the single dataset with a modest number of 
sequences that caused me problems yesterday, I had the following species and/or 
genus names - all of which constitute 10 character strings drawn from the set 
of IUPAC codes:

brachyurus (x 2)
savannarum
graduacauda
caudacutus
Camarhynchus (x 3)
madagascariensis

I don't suggest deprecating the phylip sequential, but rather, using something 
that is compatible with raxml (surely one of the most widely used phylogenetics 
programs today). I think raxml uses a relaxed sequential version of the phylip 
format with whitespace delimitation. I could read the same alignment in raxml 
with no problems, but I had multiple issues when reading the same file with 
read.dna (including the whitespace character on the first line). My guess is 
that very few people are using the original phylip format, with its limit of 10 
characters per taxon name, and with dna seqs beginning immediately after this. 
So maybe deprecate sequential phylip, but you could use what Stamatakis calls 
relaxed sequential PHYLIP, which appears to be: (1) taxon names cannot 
include spaces but can be up to 100 characters; and (2) names separated from 
sequences by whitespace character (ideally, this should recognize any number of 
spaces or tabs to prevent user confusion).

For users with tab-delimited raxml files (eg each taxon name separated from its 
dna sequence by a tab), you can use a regular-expressions enabled text editor 
(like textwrangler) to quickly find potential problems. Just search for 

[ACGTUMRWSYKVHDBN]{10}.+\t

with grep matching enabled. 

Cheers,
~Dan


On Apr 26, 2012, at 2:16 AM, Emmanuel Paradis wrote:

 Hi Dan,
 
 The reason for this implementation (searching the first 10 IUPAC-coded bases) 
 is because the exact formatting is not inconsistent among different programs. 
 Some files have:
 
 0123456789acgt.
 
 that is a 10-character name and the sequence starting on the 11th position. I 
 think this is typical for Phylip. Other software (e.g., PhyML) accepts longer 
 taxa names and require a space before the start of the sequence.
 
 About your example: it depends on the order of the data. The following file 
 can be read:
 
 2 10
 x AA
 madagascarAA
 
 But if you invert the two sequence lines, it fails.
 
 It is the first time I hear about this problem in 9 years, maybe because it 
 requires a particular combination of circumstances. Another drawback of this 
 implementation is that files with less than 10 bases cannot be read.
 
 How to solve this? If it were left only to me, I would deprecate the 
 interleaved and sequential formats. FASTA is more flexible, more widespread, 
 easier to parse, can store exactly the same information, and labels are only 
 constrained to be on a single line (but can contain any characters including 
 \n, \t, ...) But I guess many programs use the Phylip formats, so I'd be glad 
 to read other suggestions.
 
 As for your 2nd problem, it is now fixed in ape.
 
 Best,
 
 Emmanuel
 -Original Message-
 From: Dan Rabosky drabo...@umich.edu
 Sender: r-sig-phylo-boun...@r-project.org
 Date: Wed, 25 Apr 2012 17:51:35 
 To: r-sig-phylo@r-project.org
 Subject: [R-sig-phylo] read.dna warnings and pitfalls
 
 
 Hi All-
 
 I have spent an inordinate and embarrassing amount of time tracking down an 
 excruciatingly cryptic issue with read.dna, which I rarely use. Here are two 
 key problems:
 
 1) The function automatically assumes it is reading DNA sequences when it 
 encounters a string of 10 continuous DNA-like characters. This includes all 
 characters in the set (ACGTUMRWSYKVHDBN-). This function, unlike the phylip 
 original, does not have limits on taxon name lengths. Hence, I had - in the 
 middle of a large alignment - a species whose name included the string 
 MADAGASCAR, which caused a failure.  To be fair, the documentation warns of 
 this, but I think this is extremely easy to overlook, and - moreover - it 
 seems unfortunate to have to parse all your taxon names for a potential IUPAC 
 match before trying to use the function. Presumably, most users who specify 
 sequential spacing will be using whitespace to separate taxon names from DNA 
 sequences, and perhaps it is better to exploit this rather than IUPAC 
 matching. 
 
 2) The function is whitespace-sensitive. if you tab-separate the numbers on 
 the first line (numbers of taxa, numbers of sites), you'll receive an errror 
 with the message: the first line of the file must contain the dimensions of 
 the data. It appears that spaces are OK, however. 
 
 Hopefully this post will be useful to somewhere in the future with a similar 
 issue. Perhaps these can be addressed

[R-sig-phylo] read.dna warnings and pitfalls

2012-04-25 Thread Dan Rabosky

Hi All-

I have spent an inordinate and embarrassing amount of time tracking down an 
excruciatingly cryptic issue with read.dna, which I rarely use. Here are two 
key problems:

1) The function automatically assumes it is reading DNA sequences when it 
encounters a string of 10 continuous DNA-like characters. This includes all 
characters in the set (ACGTUMRWSYKVHDBN-). This function, unlike the phylip 
original, does not have limits on taxon name lengths. Hence, I had - in the 
middle of a large alignment - a species whose name included the string 
MADAGASCAR, which caused a failure.  To be fair, the documentation warns of 
this, but I think this is extremely easy to overlook, and - moreover - it seems 
unfortunate to have to parse all your taxon names for a potential IUPAC match 
before trying to use the function. Presumably, most users who specify 
sequential spacing will be using whitespace to separate taxon names from DNA 
sequences, and perhaps it is better to exploit this rather than IUPAC matching. 

2) The function is whitespace-sensitive. if you tab-separate the numbers on the 
first line (numbers of taxa, numbers of sites), you'll receive an errror with 
the message: the first line of the file must contain the dimensions of the 
data. It appears that spaces are OK, however. 

Hopefully this post will be useful to somewhere in the future with a similar 
issue. Perhaps these can be addressed in a future update to ape? 

-Dan Rabosky

___
R-sig-phylo mailing list
R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo


[R-sig-phylo] read.dna warnings and pitfalls

2012-04-25 Thread Dan Rabosky

Hi All-

I have spent an inordinate and embarrassing amount of time tracking down an 
excruciatingly cryptic issue with read.dna, which I rarely use. Here are two 
key problems:

1) The function automatically assumes it is reading DNA sequences when it 
encounters a string of 10 continuous DNA-like characters. This includes all 
characters in the set (ACGTUMRWSYKVHDBN-). This function, unlike the phylip 
original, does not have limits on taxon name lengths. Hence, I had - in the 
middle of a large alignment - a species whose name included the string 
MADAGASCAR, which caused a failure.  To be fair, the documentation warns of 
this, but I think this is extremely easy to overlook, and - moreover - it seems 
unfortunate to have to parse all your taxon names for a potential IUPAC match 
before trying to use the function. Presumably, most users who specify 
sequential spacing will be using whitespace to separate taxon names from DNA 
sequences, and perhaps it is better to exploit this rather than IUPAC matching. 

2) The function is whitespace-sensitive. if you tab-separate the numbers on the 
first line (numbers of taxa, numbers of sites), you'll receive an errror with 
the message: the first line of the file must contain the dimensions of the 
data. It appears that spaces are OK, however. 

Hopefully this post will be useful to somewhere in the future with a similar 
issue. Perhaps these can be addressed in a future update to ape? 

-Dan Rabosky

___
R-sig-phylo mailing list
R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo


Re: [R-sig-phylo] birthdeath {ape} fails every time.

2012-04-19 Thread Dan Rabosky

If you want simple fitting of a constant-rate birth-death process to a 
phylogeny, another option is in diversitree (see make.bd). Last time I checked, 
bd in laser (and birthdeath in ape) used transformations that kept mu  lambda, 
but this is not the case for diversitree. The package is well-maintained. 
Treepar might also have something like this as well. Diversitree also allows 
you to incorporate incomplete taxon sampling.

With diversitree, this would basically just be:

myLikelihoodFunction - make.bd(myTree, sampling.f = 1.0)
pars_init - c(0.1, 0.05)
find.mle(myLikelihoodFunction, pars_init)

~Dan Rabosky


On Apr 19, 2012, at 5:02 PM, Matt Pennell wrote:

 Simon,
 
 For reasons that I do not fully understand, the problem arises with the use
 of a coalescent tree (which has a very branch length distribution than a
 tree generated under a birthdeath process). maybe someone else has some
 insight into why this causes birthdeath to fail.
 
 if you run
 library(geiger)
 xx - birthdeath.tree(1, 0, taxa.stop=100)
 birthdeath(xx)
 
 it works. However it does produce warning messages generated in the
 optimization procedure (which should not be cause to worry; laser's
 equivalent function just suppresses these warnings).
 
 if you really want to use a coalescent tree you can still use
 library(laser)
 yy - rcoal(100)
 bd(branching.times(yy))
 
 hope this helps.
 matt
 
 
 On Thu, Apr 19, 2012 at 4:52 PM, Simon Greenhill s.greenh...@auckland.ac.nz
 wrote:
 
 Morning all,
 
 Whenever I try to use ape's (v 3.0-2) birthdeath function e.g.
 
 birthdeath(rcoal(sample(10:100, 1)))
 
 I get the following error:
 
 Error in while (foo(up)  0) up - up + inc :
 missing value where TRUE/FALSE needed
 
 The warnings all appear to be this sort of thing:
 
 1: In log(1 - a) : NaNs produced
 2: In log(exp(r * x[2:N]) - a) : NaNs produced
 3: In nlm(function(p) dev(p[1], p[2]), c(0.1, 0.2), hessian = TRUE) :
 NA/Inf replaced by maximum positive value
 
 and seem to be generated from this line:
   out - nlm(function(p) dev(p[1], p[2]), c(0.1, 0.2), hessian = TRUE)
 
 
 Is there a fix/workaround for this?
 
 Kind regards,
 
 Simon
 
 ___
 R-sig-phylo mailing list
 R-sig-phylo@r-project.org
 https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
 
 
   [[alternative HTML version deleted]]
 
 ___
 R-sig-phylo mailing list
 R-sig-phylo@r-project.org
 https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
 
 

___
R-sig-phylo mailing list
R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo


Re: [R-sig-phylo] Estimating MRCA dates

2012-02-02 Thread Dan Rabosky

Hi Brian-

It has been awhile since I played with chronopl(), but at the time, I convinced 
myself that results were substantially different from penalized likelihood as 
implemented in r8s. I haven't explored this exhaustively, but at the time, I 
was very much convinced that the results of chronopl were potentially 
problematic. Other folks have mentioned to me that they've observed similar 
differences between r8s and chronopl and that they've recovered bizarre 
smoothed trees from chronopl. Perhaps someone with more recent chronopl 
experience can weigh in on this...

~Dan



On Feb 2, 2012, at 6:06 AM, Brian O'Meara wrote:

 Ape can do this, function chronopl. In general, one good way to find out
 which package can do something is by looking at the task view for
 phylogenetics: http://cran.r-project.org/web/views/Phylogenetics.html .
 This will also allow you to install all the phylogenetics packages (at
 least those on CRAN) with just a couple of commands. An updated version
 should be coming out later today (and if someone notices other things
 missing, please let me know).
 
 Brian
 
 ___
 Brian O'Meara
 Assistant Professor
 Dept. of Ecology  Evolutionary Biology
 U. of Tennessee, Knoxville
 http://www.brianomeara.info
 
 Students wanted: Applications due Dec. 15, annually
 Postdoc collaborators wanted: Check NIMBioS' website
 Funding wanted: Want to collaborate on a grant?
 
  

___
R-sig-phylo mailing list
R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo


Re: [R-sig-phylo] How to compute branching.time for specific node

2011-10-02 Thread Dan Rabosky

Hi Axel-

You can pull out the node number as you note below:

mynode - mrca(treefile)[taxon1, taxon2]

and then pull out the relative branching time as 

branching.times(treefile)[as.character(mynode)]

using the as.character() tells R to treet the node number as a name 
attribute, and since the branching.times() vector has names, it should give you 
the relevant time. I'm not sure what you mean by append this command, but it 
sounds like you want to do this to a bunch of trees in a file (?). I'd write a 
function, read in all your trees, and then use lapply. E.g:

trees - read.tree('mytrees.tre')

myFx - function(phy, tx1, tx2){
mynode - mrca(treefile)[ tx1, tx2 ]
bt - branching.times(phy)[as.character(mynode)]
return(bt)
}

mytimes - lapply( trees, myFx, tx1 = taxon1, tx2 = taxon2 ); #note that 
using lapply returns a list

unlisted.times - unlist(mytimes); #convert 'mytimes' to a vector


Cheers,
~Dan


On Oct 1, 2011, at 10:53 PM, Schoenhofer, Axel wrote:

 Hi everybody,
 I try to wrap my head around how to get a node age from a set of dated 
 ultrametric tree and only for a specific node, to be specified by tip taxa. 
 I'd further like to write them to a file, each value to a new line, to get 
 them in Excel or use them furthe r in R.
 The following comands I found usefull for my task but could not pierce them 
 together:
 
 mrca(treefile)[taxon1,taxon2] # will return the respective node number. I 
 need this as it will give me a node regardless where taxa are in the tree.
 
 branching.times(treefile)# problem is that I get a list of all 
 nodes and branching times and I would like to specify it for only the 
 mrca-node.
 
 I read something about specifying node.names to use them in combination with 
 branching.times but could neither of those get to work.
 
 How would I append this command to a consecutive Newick trees in a file?
 Thanks so much for any suggestion.
 Axel
 
   [[alternative HTML version deleted]]
 
 ___
 R-sig-phylo mailing list
 R-sig-phylo@r-project.org
 https://stat.ethz.ch/mailman/listinfo/r-sig-phylo


 
Daniel L Rabosky
Department of Integrative Biology
University of California, Berkeley
Berkeley, CA 94720

[starting 2012]
Assistant Professor
Department of Ecology and Evolutionary Biology
University of Michigan

Assistant Curator of Herpetology,
University of Michigan Museum of Zoology
 




[[alternative HTML version deleted]]

___
R-sig-phylo mailing list
R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo


Re: [R-sig-phylo] fitContinuous in geiger

2011-05-23 Thread Dan Rabosky

Hi Carl, Annemarie-

While it is possible in principle that Annemarie's results reflect true ML 
estimate of lambda = 1, I think the practical reason this is occurring is that 
the default bounds on lambda in fitContinuous are 1e-7 and 1. Because 
optimization in fitContinuous uses a bounded BFGS algorithm, hard bounds are 
imposed at those values. You could increase the bounds to lambda  1, and it is 
possible for lambda to exceed 1. This may not make sense, and it is also 
possible that the optimization will fail when it tries a lambda  1, because 
enabling lambda to exceed 1 may result in an NA for the likelihood (Liam has 
provided some earlier discussion of this issue). 

But the important point is that no matter what the ML value of lambda, you are 
limited to those default bounds as specified in fitContinuous - unless you 
explicitly increase them! This is also relevant to other parameters associated 
with other trait evolution models, so it is good to ask questions if you 
observe behavior like thisfor lambda, delta, or anything else. It could 
even happen to the Brownian rate parameter if the phylogeny branch lengths are 
small relative to trait variance, because the default upper bound is 20.

Cheers,
~Dan






On May 23, 2011, at 11:46 AM, Carl Boettiger wrote:

 Hi Annemarie,
 
 No problem, tried to give some answers below.
 
 
 On Mon, May 23, 2011 at 8:05 AM, Annemarie Verkerk annemarie.verk...@mpi.nl
 wrote:
 
 Dear Carl, Liam, and others,
 
 thanks for your explanation of what went wrong in the fitContinuous
 calculations. I set beta to a large number (100) in order to stop it
 from reading the maximum value. Then, I got exactly the same results for
 lambda with the non-multiplied and the multiplied data.
 
 Okay, I still have two more problems - probably they will sound stupid to
 you, but I am still very much a newbie to this and if it suffices just to
 point me to other sources where this has already been discussed please do
 so!
 
 the log likelyhood: between the non-multiplied and the multiplied data,
 there is still a difference. Liam, you write 'if the variance of this
 distribution is small, the value of the function will be large (i.e.,
 greater than 1.0)'. However, the variance between the non-multiplied and the
 multiplied data is exactly the same. So why should log likelyhood values
 change when data is multiplied? Why do I get a normal looking value (around
 -200) (with the beta scale set for a very large scale) for the multiplied
 data? And why does the log likelyhood become so big (-2000) when the initial
 maximum beta value of 20 is reached if the scale is (0,20)?
 
 
 Liam is referring to the variance of the likelihood distribution, not the
 variance of the traits.  If the diversification rate is high, then any
 particular outcome is very improbable, so its probability density has high
 variance and corresponding low value for the prob density at any particular
 value.  Hence the large negative log-likelihood for large beta.  (Recall log
 lik around -2000 means a density of exp(-2000), which is very near zero,
 illustrating why we need to use logs!)
 
 
 the lambda scores: now, my lambda scores for the non-multiplied and the
 multiplied data are the same. However, most of them (999 out of 1000) are
 '1'. To my understanding, '0' and '1' are theoretical limits for lambda that
 one normally does not reach. So I'm afraid I'm still unsure why this
 happens, and what it means.
 
 
 You raise an important point here.  Just because they are the boundary
 values, does not imply that these theoretical limits are uncommon -- in fact
 one may expect to hit the limits more often than any other value.  Numerical
 optimization will often push estimates of a parameter all the way to its
 boundary.  This just means that increasing (deceasing) the parameter value
 increases the likelihood, so it keeps doing so until it can go no further.
 This can result from, but does not necessarily imply, that the model is
 inappropriate (it is also particularly common for small numbers of taxa, for
 instance), and can be more common on relatively flat likelihood surfaces.
 
 So I guess my short answer is this isn't a problem and my longer answer is
 be suspicious whenever you get back boundary estimates, and consider
 bootstrapping.
 
 
 HTH,
 
 Carl
 
 I hope you don't mind this new response to the thread.
 
 With kind regards,
 Annemarie
 
 
 Liam J. Revell wrote:
 
 Hi Annemarie,
 
 Positive log-likelihoods are not a problem.  The log-likelihood is
 calculated by summing the log probability densities, which come from a
 function that integrates to 1.0.  Thus, if the variance of this distribution
 is small, the value of the function will be large (i.e., greater than 1.0).
 
 The phenomenon of decreasing mean lambda when you increase the scale
 (i.e., multiply by 10 or 100) is probably due to bounds on beta (in the
 lambda model, sigma^2) in fitContinuous().  The default upper bound is 

Re: [R-sig-phylo] Conditioning on Total Tips in Birth-Death Trees

2011-03-23 Thread Dan Rabosky

Hi Dave-


 Are there any birth-death tree functions which condition on the total
 number of tips (extinct and extant) on a tree rather than the number
 of surviving tips?

You can recode the birthdeath.tree function from Geiger to do this if you want. 
The total number of species in the tree (living plus extinct) is equal to 
nrow(edge)/2, where edge is the edge matrix. Find the repeat statement and 
replace the line:

if (taxa.stop) 
if (sum(alive) = taxa.stop) 
  break

with the following:

if (taxa.stop)
if ((nrow(edge)/2) = taxa.stop)
break;

The new birthdeath.tree function should stop when taxa.stop lineages have been 
born (regardless of whether they've subsequently gone extinct).

  I was wondering if it was known what
 the expected probability distribution of branch lengths for a
 fully-sampled phylogeny (includes all extinct lineages) is?

If you have a perfectly sampled tree, branches should be exponentially 
distributed with rate (lambda + mu). Events occur in the birth-death process 
with rate (lambda + mu), and the probability that a given event will be 
speciation is lambda/(lambda + mu).

~Dan



 I've been
 trying to figure out, if for a fully extinct tree with homogenous
 rates,  whether we would expect the branch lengths of internal
 branches to follow an exponential distribution based on the birth
 parameter (and extinct terminal edges following an exponential
 distribution based on the death parameter) or if the branch lengths
 are a function of both parameters. Perhaps I'm not using the right
 keywords, but my literature searches haven't found the information I'm
 looking for.
 
 Thanks,
 -Dave
 
 -- 
 David Bapst
 Dept of Geophysical Sciences
 University of Chicago
 5734 S. Ellis
 Chicago, IL 60637
 http://home.uchicago.edu/~dwbapst/
 
 ___
 R-sig-phylo mailing list
 R-sig-phylo@r-project.org
 https://stat.ethz.ch/mailman/listinfo/r-sig-phylo





[[alternative HTML version deleted]]

___
R-sig-phylo mailing list
R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo


Re: [R-sig-phylo] Q: Likelihood ratio test and rate shift test on complete dated phylogeny

2011-03-22 Thread Dan Rabosky

Hi Dave-

 If I understand correctly, that's a separate issue than what Ted is
 getting at; I think he's worried about the effect of data dredging by
 running all possible models (as Burnham and Anderson warn against) and
 not mis-parametrization of the simple model.

I don't think I'd call this data dredging. You are still comparing two models, 
but one has a shift parameter (the two-rate model) and one doesn't. Finding the 
maximum likelihood estimate of the shift parameter involves looking at the 
combined likelihood of the data partitions when the shift location is treated 
as a free parameter. However, assessing the *improvement in fit* of the 2 rate 
model over the simple 1 rate model is the trickier issue and may require 
simulation under the null hypothesis to evaluate correctly, as Joseph points 
out.

~Dan




 Data-dredging will bias
 us toward finding complex models that fit overly well. The common
 solution in data mining is to subset your data and thus confirm a
 preferred model for one subset by testing the preference in another
 subset, but I don't think that can be applied to studies with a single
 tree.
 
 -Dave
 
 On Tue, Mar 22, 2011 at 12:37 PM, Dan Rabosky drabo...@berkeley.edu wrote:
 
 Hi Ted-
 
 Others can weigh in here, but - at the very least - I would consider a model 
 with 2 rates as having at least 1 additional parameter (the location of the 
 rate shift). This seems to perform OK in practice. I admit that I have 
 always been a bit uncomfortable with this, though - the null hypothesis 
 model does not have a shift point. I think there are parallels here to 
 finding breakpoints in segmented regression models. Here's a paper I've been 
 meaning to read that might be relevant.
 
 Davies, R.B. (1987) Hypothesis testing when a nuisance parameter is present 
 only under the alternative. Biometrika 74, 33–43.
 
 ~Dan
 
 
 
 
 
 
 
 On Mar 22, 2011, at 10:19 AM, tgarl...@ucr.edu tgarl...@ucr.edu wrote:
 
 This can also be done using the package diversitree, with the
 function make.bd.split and variants thereof. You will probably
 have to write a function to iteratively partition the tree
 into all possible splits.
 
 Hi Dan,
 
 How would this approach deal with the problem of multiple comparisons, 
 i.e., in effect testing many different hypotheses (or exploratory data 
 mining) on the same data set and overall tree?
 
 Cheers,
 Ted
 
 
 Theodore Garland, Jr.
 Professor
 Department of Biology
 University of California, Riverside
 Riverside, CA 92521
 Office Phone:  (951) 827-3524
 Wet Lab Phone:  (951) 827-5724
 Dry Lab Phone:  (951) 827-4026
 Home Phone:  (951) 328-0820
 Facsimile:  (951) 827-4286 = Dept. office (not confidential)
 Email:  tgarl...@ucr.edu
 
 Main Departmental page:
 http://www.biology.ucr.edu/people/faculty/Garland.html
 
 List of all Publications:
 http://www.biology.ucr.edu/people/faculty/Garland/GarlandPublications.html
 
 Garland and Rose, 2009
 http://www.ucpress.edu/books/pages/10604.php
 
 
   Original message 
 
Date: Tue, 22 Mar 2011 09:44:26 -0700
From: Dan Rabosky drabo...@berkeley.edu
Subject: Re: [R-sig-phylo] Q: Likelihood ratio test and rate shift
test on complete dated phylogeny
To: R.J. den Tex rjden...@yahoo.com
Cc: r-sig-phylo@r-project.org
 
 
 Hi Robert-
 
 (i) Can you use a Likelihood ratio test to test whether a birth
death model fits your data better than a pure birth model?
(likelihood scores are obtained from LASER).
 
 Yes, this would be valid, as the models are nested.
 
 (ii) How can I test if a rate shift has occurred on a particular
node/branch in a complete taxon phylogeny?
 
 There are lots of possible methods for this. Or perhaps it is
better to say that there are lots of variants of a single general
approach. There are the 1-rate versus 2-rate model in LASER
(fitNDR_2rate, fitNDR_1rate) after Rabosky et al. (Proc. R. Soc. B,
2007); there is the MEDUSA approach, which would allow more than 1
rate shift across your tree (Alfaro et al., PNAS; this method is
implemented in GEIGER).
 
 This can also be done using the package diversitree, with the
function make.bd.split and variants thereof. You will probably have
to write a function to iteratively partition the tree into all
possible splits. Actually, I think the diversitree calculations
*may* be the most accurate if you have different extinction rates
across the tree.
 
 There is also the method from Moore and Donoghue (2009, PNAS) - i
think tRate is the name of the package - although I don't think this
is distributed on the archive for R packages.
 
 ~Dan Rabosky
 
 
 
 Thank you very much beforehand!
 
 Robert den Tex
 PhD student
 Dept. Evolutionary Biology
 EBC, Uppsala University
 Uppsala,
 Sweden
 
 ___
 R-sig-phylo mailing list
 R-sig-phylo@r-project.org
 https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
 
 
 
 
 
 [[alternative HTML

Re: [R-sig-phylo] mntd and mpd without community data

2011-03-17 Thread Dan Rabosky

Hi Scott-

There are lots of indices to quantify tree shape differences, both in terms of 
topology (Colless etc) and temporal differences (e.g., Pybus and Harvey's 
gamma). MNTD and MPD will largely capture the temporal dimension of your trees 
and will be highly correlated with gamma. For example, if you have long 
terminal branches in your tree and with very short internal branches, gamma 
will be negative and both MPD and MNTD will be large. 

I don't think there is a right or wrong answer here and any index might be 
appropriate, but you will have to think hard about what exactly you want to 
quantify (e.g, topological imbalance or asymmetry versus branching time 
differences) and why. It might be good to think about standardized metrics, to 
control for differences in the number of taxa in trees. You may want to use 
several indices in concert.

~Dan Rabosky

On Mar 17, 2011, at 8:45 AM, Scott Chamberlain wrote:

 Hello, 
 
 
 I am curious if it is appropriate to calculate mntd (mean nearest taxon 
 distance) and mpd (mean pairwise distance) in the picante package on trees 
 themselves, that is, without community data. 
 
 We are trying to think of informative metrics that can tell us something 
 about tree shape among lots of different trees. Using mntd and mpd we give 
 the functions a tree and just a vector of all 1's for the community data so 
 that each species is equally abundant. 
 
 Does this approach make sense? Are there better metrics to use given that we 
 are just dealing with trees without community data? 
 
 (I am aware of Sackin's, Colless', beta splitting, gamma, etc.)
 
 
 
 Here is a reproducible example of what I am doing:
 require(picante)
 require(ape)
 
 tree - rcoal(10)
 abund - rep(1, 10)
 names(abund) - tree$tip.label
 mpd_ - mpd(rbind(abund, abund), # can't have just one vector/community 
 apparently
 cophenetic(tree))[1] # just one of the two numbers needed as they are the same
 
 
 
 Sincerely, 
 Scott Chamberlain
 Rice University, EEB Dept.
 
 
 
   [[alternative HTML version deleted]]
 
 ___
 R-sig-phylo mailing list
 R-sig-phylo@r-project.org
 https://stat.ethz.ch/mailman/listinfo/r-sig-phylo

 





[[alternative HTML version deleted]]

___
R-sig-phylo mailing list
R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo


Re: [R-sig-phylo] reverse order plotting of newick tree/phylo object

2011-03-17 Thread Dan Rabosky

 But can you explain to me what is the rationale behind this? There are
 only 47 nodes and 54 tips. How can the nodes from 55 to 47 than be
 rotated?


Liam's code is rotating nodes 55+(1:47), rather than 55 to 47. The internal 
node indexing in ape's phylo class starts with number of tips plus 1. Thus, 
using rotate on nodes starting with length(phy$tip.label) + 1 means it is 
starting with node 56 (the root node) and going through length(phy$tip.label) + 
phy$Nnode, e.g.,, all internal nodes. You can see that

55+(1:47)

gives a vector of node numbers that is not the same as 55 to 47.

If it isn't working right - maybe it is that you have polytomies? I think you 
should have n-1 internal nodes (53 in your case). Try multi2di (from ape) for 
polytomy resolution and see if it works as desired.

~Dan Rabosky



 
 Kind regards,
 
 Thierry
 
 Thierry Janssens
 Postdoctoral researcher
 Delft University of Technology
 Bionanoscience
 Kavli Institute of Nanoscience
 Lorentzweg 1
 2628LJ Delft
 the Netherlands
 Tel: +31 15 2781175
 Fax:+31 15 2781202
 e-mail: t.k.s.janss...@tudelft.nl
 
 -Original Message-
 From: Liam J. Revell [mailto:liam.rev...@umb.edu] 
 Sent: donderdag 17 maart 2011 16:11
 To: Thierry Janssens - TNW
 Cc: r-sig-phylo@r-project.org
 Subject: Re: [R-sig-phylo] reverse order plotting of newick tree/phylo
 object
 
 Hi Thierry,
 
 There might be a more elegant way to do this, but you can just apply the
 ape function rotate() to each node number of the tree (excluding
 tips).
 
 I.e.
 
 tr2-tree
 for(i in length(tr2$tip)+1:tr2$Nnode) tr2-rotate(tr2,i)   plot(tr2)
 
 [rotate() may also be able to take a vector of nodes, but I was not able
 to get this to work.]
 
 - Liam
 
 --
 Liam J. Revell
 University of Massachusetts Boston
 web: http://faculty.umb.edu/liam.revell/
 email: liam.rev...@umb.edu
 blog: http://phytools.blogspot.com
 
 On 3/17/2011 10:51 AM, Thierry Janssens - TNW wrote:
 Dear R-sig-phylo,
 
 
 
 I am looking for a method to plot an unrooted tre/phylo object e in 
 the reverse order (of the tip labels). Like all the nodes would have 
 rotated.
 
 
 
 Any of you has an idea?
 
 
 
 Kind regards,
 
 
 
 Thierry
 
 
 
 Thierry Janssens
 
 Postdoctoral researcher
 
 Delft University of Technology
 
 Bionanoscience
 
 Kavli Institute of Nanoscience
 
 Lorentzweg 1
 
 2628LJ Delft
 
 the Netherlands
 
 Tel: +31 15 2781175
 
 Fax:+31 15 2781202
 
 e-mail: t.k.s.janss...@tudelft.nlmailto:t.k.s.janss...@tudelft.nl
 
 
 
 
  [[alternative HTML version deleted]]
 
 ___
 R-sig-phylo mailing list
 R-sig-phylo@r-project.org
 https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
 
 ___
 R-sig-phylo mailing list
 R-sig-phylo@r-project.org
 https://stat.ethz.ch/mailman/listinfo/r-sig-phylo






[[alternative HTML version deleted]]

___
R-sig-phylo mailing list
R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo


Re: [R-sig-phylo] fitContinuous in geiger: positive log-likelihoods when trait values 1

2011-03-07 Thread Dan Rabosky

Hi Nick-

Are you are getting differences in relative AICs between models from simple 
rescaling (multiplying by a constant)? 

The actual values of the traits *might* matter for optimization, depending on 
various parameters associated with optimization (and whatever algorithm is 
being used - this should be L-BFGS-B for fitContinuous, I think). So...if 
relative AICs are different, the first thing I would check is whether some of 
your model AICs reflect local optima. Do lots of optimizations with random 
starting parameters, which is usually sufficient - and failing that, you can 
get into the guts of optim and mess with the actual arguments that control the 
optimization (e.g., parscale etc). 

FYI, this is not immediately transparent in Geiger, as many of the functions 
are hidden. To get at the optimization, look at

geiger:::fitContinuousModel

which does the heavy lifting within fitContinuous, and if you need to, you can 
recode the relevant call to optim to have a bit more flexibility. 

~Dan




On Mar 7, 2011, at 4:04 PM, Nick Matzke wrote:

 Doh!  Really should have remembered that, likelihoods-can-be-greater-than-1 
 is likelihood 101...
 
 I am still a little puzzled by the dramatically different results between 
 rescaling and not, will try to post an example in a sec...
 






[[alternative HTML version deleted]]

___
R-sig-phylo mailing list
R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo


Re: [R-sig-phylo] phylogenetic analysis of ratios

2009-10-06 Thread Dan Rabosky

Hi Ted and others -

Here's a further complication. The raw data are actually species'  
responses to different treatments, such that you have something like:

(observed mass) = (species mass) + (response to treatment X) +  
(response to treatment Y) + (interactions) + etc.

Lets say Y_i_j_k is the response of species i to treatment i and k, k  
and i take values of 0 or 1 depending on whether species receives the  
treatment.

So, Y_i_j_k = mass_i + X_j + Y_k + ...

And we are actually interested in analyzing response variables  
(involving ratios) that are themselves functions of the Y values  
themselves. I think that we should assume that the treatment effects  
themselves are to be modeled on the phylogeny, not the response  
variables. So, you might imagine that (under the null hypothesis)  
observed mass Y_1, 1, 1 is the sum of 3 random variables drawn from a  
multivariate normal distribution: species_mass, X, and Y. These 3  
should have the same underlying variance-covariance structure, but  
different constants of proportionality, although Y_1_0_0 would only be  
drawn only from the 'species_mass' distribution.

This has me stumped. Maybe I am making this too hard.

~Dan





On Oct 5, 2009, at 1:34 PM, tgarl...@ucr.edu tgarl...@ucr.edu wrote:

 The idea of simulating the underlying raw data seems good to
 me, and should be pretty bullettproof.

 I'd be curious to know what, exactly, the ratios are
 representing.  I think a lot about this for things like
 organ mass/body mass or leg length/body length, and am
 always interested to learn how others deal with these sorts
 of things.  Feel free to go off line for explanation if
 you think it would bore this list.

 Cheers,
 Ted

 Theodore Garland, Jr., Ph.D.
 Professor
 Department of Biology
 University of California
 Riverside, CA 92521
 Phone:  (951) 827-3524 = Ted's office (with answering machine)
 Phone:  (951) 827-5724 = Ted's lab
 Phone:  (951) 827-5903 = Dept. office
 Home Phone:  (951) 328-0820
 FAX:  (951) 827-4286 = Dept. office
 Email:  tgarl...@ucr.edu
 http://biology.ucr.edu/people/faculty/Garland.html
 List of all publications with PDF files:
 http://biology.ucr.edu/people/faculty/Garland/GarlandPublications.html

 Garland, T., Jr., and M. R. Rose, eds. 2009. Experimental evolution:  
 concepts, methods, and applications of selection experiments.  
 University of California Press, Berkeley, California.
 http://www.ucpress.edu/books/pages/10604.php

 Associate Director
 Network for Experimental Research on Evolution
 http://nere.bio.uci.edu/
 (A University of California Multicampus Research Project)



   Original message 

Date: Mon, 5 Oct 2009 16:23:24 -0400 (EDT)
From: Daniel Lee Rabosky dl...@cornell.edu
Subject: [R-sig-phylo] phylogenetic analysis of ratios
To: r-sig-phylo@r-project.org

 Howdy-

 I have a two-part problem involving the phylogenetic
analysis of ratios.
 Specifically, I'm interested in the correlation between
two ratios X and
 Y.

 The problem is actually more complicated than this,
because ratios X and Y
 depend in part on the same underlying data. My
understanding is that this
 is not a simple dependency, eg with X = a/c and Y = b/c,
in which case I
 think log-transforming X and Y would enable us to simply
analyze the
 relationship between log(a) and log(b). [I'm asking on
behalf of a
 collaborator, so I don't have the actual data in front of
me].

 One solution might involve parametrically simulating the
underlying raw
 data a, b, c etc (independently) under fitted models of
phenotypic
 evolution, then for each replicate calculating the ratios
X and Y and
 using them to construct the null distribution of
correlations. This would
 maintain the variance-covariance structure of the
underlying data, and
 should control for any non-independence due to the use of
shared
 underlying data in X and Y.

 Any thoughts?

 ~Dan Rabosky

 ___
 R-sig-phylo mailing list
 R-sig-phylo@r-project.org
 https://stat.ethz.ch/mailman/listinfo/r-sig-phylo





[[alternative HTML version deleted]]

___
R-sig-phylo mailing list
R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo


[R-sig-phylo] phylogenetic analysis of ratios

2009-10-05 Thread Dan Rabosky

Howdy-

I have a two-part problem involving the phylogenetic analysis of  
ratios. Specifically, I'm interested in the correlation between two  
ratios X and Y.

The problem is actually more complicated than this, because ratios X  
and Y depend in part on the same underlying data. My understanding is  
that this is not a simple dependency, eg with X = a/c and Y = b/c, in  
which case I think log-transforming X and Y would enable us to simply  
analyze the relationship between log(a) and log(b). [I'm asking on  
behalf of a collaborator, so I don't have the actual data in front of  
me].

One solution might involve parametrically simulating the underlying  
raw data a, b, c etc (independently) under fitted models of phenotypic  
evolution, then for each replicate calculating the ratios X and Y and  
using them to construct the null distribution of correlations. This  
would maintain the variance-covariance structure of the underlying  
data, and should control for any non-independence due to the use of  
shared underlying data in X and Y.

Any thoughts?

~Dan Rabosky




[[alternative HTML version deleted]]

___
R-sig-phylo mailing list
R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo


[R-sig-phylo] PHylogenetic analysis of ratios

2009-10-05 Thread Dan Rabosky

Howdy-

I have a two-part problem involving the phylogenetic analysis of  
ratios. Specifically, I'm interested in the correlation between two  
ratios X and Y.

The problem is actually more complicated than this, because ratios X  
and Y depend in part on the same underlying data. My understanding is  
that this is not a simple dependency, eg with X = a/c and Y = b/c, in  
which case I think log-transforming X and Y would enable us to simply  
analyze the relationship between log(a) and log(b). [I'm asking on  
behalf of a collaborator, so I don't have the actual data in front of  
me].

One solution might involve parametrically simulating the underlying  
raw data a, b, c etc (independently) under fitted models of phenotypic  
evolution, then for each replicate calculating the ratios X and Y and  
using them to construct the null distribution of correlations. This  
would maintain the variance-covariance structure of the underlying  
data, and should control for any non-independence due to the use of  
shared underlying data in X and Y.

Any thoughts?

~Dan Rabosky




[[alternative HTML version deleted]]

___
R-sig-phylo mailing list
R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo


Re: [R-sig-phylo] age of ancestral node

2009-04-14 Thread Dan Rabosky

Hi Manoel-

It isn't entirely clear what you want to do. Do you want to, for each  
node in the list, (a) find the parent node, then (b) find the age of  
that parent node?

The basic operation is straightforward:

Given some node x and your tree:

parent - tree$edge[,1][tree$edge[,2]==x]
# gives the parent node of node x

branching.times(tree)[as.character(parent)]
# gives the divergence time of the parent node

You can loop over your list using the above operations.

~Dan


On Apr 14, 2009, at 6:19 PM, Manoel Silva wrote:



 Ok, here's my problem. I have a list of nodes in an ultrametric  
 tree, and I'd like to generate a vector of the ages of its  
 immediate ancestral nodes. I'm sure there is a simple way to do  
 this, but I'm stuck. Any hints?

 Manoel



   [[alternative HTML version deleted]]

 ___
 R-sig-phylo mailing list
 R-sig-phylo@r-project.org
 https://stat.ethz.ch/mailman/listinfo/r-sig-phylo

Dan Rabosky
Department of Ecology and Evolutionary Biology 
Fuller Evolutionary Biology Program
Cornell Lab of Ornithology
Cornell University






[[alternative HTML version deleted]]

___
R-sig-phylo mailing list
R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo