[R-sig-phylo] Evolutionary Quantitative Genetics Workshop 2023

2023-03-06 Thread Joe Felsenstein
A reminder: applications due March 31.

The Evolutionary Quantitative Genetics Workshop for 2023 will take
place at Friday Harbor Laboratories of the University of Washington
from June 4-9.  The workshop will review the basics of theory in the
field of evolutionary quantitative genetics and its connections to
evolution observed at various time scales. One aim of the workshop is
to build a bridge between the traditionally separate disciplines of
quantitative genetics and phylogenetic comparative methods. The
workshop involves lectures, discussions and in-class computer
exercises.  This workshop has been given yearly since 2011, and at
Friday Harbor Laboratories since 2017. It was canceled in 2020, and
given as an online workshop in 2021 and 2022.  It is intended for
graduate students, postdocs, and junior faculty.  Information on the
subject area, lecturers, costs and application form will be found at:

https://fhl.uw.edu/courses/course-descriptions/course/evolutionary-quantitative-genetics-workshop-2023/

and more information, including details of the Workshop in recent
years, will be found at

https://eqgw.github.io

Joe Felsenstein and Steve Arnold
-- 
Joe Felsensteinfelse...@gmail.com,   felse...@uw.edu
 Department of Genome Sciences and Department of Biology,
 University of Washington, Box 355065, Seattle, WA 98195-5065 USA
-
 PS: please do not use  j...@gs.washington.edu, which is an alias
 that some mail systems now mistake as indicating spam.

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/


Re: [R-sig-phylo] Evolutionary Quantitative Genetics Workshop 2023

2023-01-27 Thread Joe Felsenstein
K-Hz7m0Vt54!lAc-_tyBKyUS0BwdXHfS_bddA8mSgcsrMp6Nly1GJM-UkNMs09pABZP0gIb89WvlGmsVldrV5B-5tMdJeYl2P7XvDP20eBi7Dw$
>
> Introduction to Bayesian modelling with INLA (BMIN02)
> May 22nd - 26th
> https://urldefense.com/v3/__https://www.prstatistics.com/course/introduction-to-bayesian-modelling-with-inla-bmin02/__;!!K-Hz7m0Vt54!lAc-_tyBKyUS0BwdXHfS_bddA8mSgcsrMp6Nly1GJM-UkNMs09pABZP0gIb89WvlGmsVldrV5B-5tMdJeYl2P7XvDP09ie8kTQ$
>
> Reproducible and collaborative data analysis with R (RACR02)
> June 13th - 15th
> https://urldefense.com/v3/__https://www.prstatistics.com/course/reproducible-and-collaborative-data-analysis-with-r-racr02/__;!!K-Hz7m0Vt54!lAc-_tyBKyUS0BwdXHfS_bddA8mSgcsrMp6Nly1GJM-UkNMs09pABZP0gIb89WvlGmsVldrV5B-5tMdJeYl2P7XvDP2JO-LW_Q$
>
> Species Distribution Modelling With Bayesian Statistics Using R (SDMB05)
> September 4th - 8th
> https://urldefense.com/v3/__https://www.prstatistics.com/course/online-course-species-distribution-modelling-with-bayesian-statistics-using-r-sdmb05/__;!!K-Hz7m0Vt54!lAc-_tyBKyUS0BwdXHfS_bddA8mSgcsrMp6Nly1GJM-UkNMs09pABZP0gIb89WvlGmsVldrV5B-5tMdJeYl2P7XvDP0ZmkHHkg$
>
> Multivariate Analysis Of Ecological Communities Using R With The VEGAN
> package (VGNR05)
> September 18th - 22nd
> https://
> https://urldefense.com/v3/__http://www.prstatistics.com/course/multivariate-analysis-of-ecological-communities-using-r-with-the-vegan-package-vgnr05/__;!!K-Hz7m0Vt54!lAc-_tyBKyUS0BwdXHfS_bddA8mSgcsrMp6Nly1GJM-UkNMs09pABZP0gIb89WvlGmsVldrV5B-5tMdJeYl2P7XvDP3z5_1e7g$
>
> --
>
> Oliver Hooker PhD.
> PR statistics
>
> [[alternative HTML version deleted]]
>
> ___
> R-sig-phylo mailing list - R-sig-phylo@r-project.org
> https://urldefense.com/v3/__https://stat.ethz.ch/mailman/listinfo/r-sig-phylo__;!!K-Hz7m0Vt54!lAc-_tyBKyUS0BwdXHfS_bddA8mSgcsrMp6Nly1GJM-UkNMs09pABZP0gIb89WvlGmsVldrV5B-5tMdJeYl2P7XvDP1_L29Apw$
> Searchable archive at 
> https://urldefense.com/v3/__http://www.mail-archive.com/r-sig-phylo@r-project.org/__;!!K-Hz7m0Vt54!lAc-_tyBKyUS0BwdXHfS_bddA8mSgcsrMp6Nly1GJM-UkNMs09pABZP0gIb89WvlGmsVldrV5B-5tMdJeYl2P7XvDP1XIWsucQ$



-- 
Joe Felsensteinfelse...@gmail.com,   felse...@uw.edu
 Department of Genome Sciences and Department of Biology,
 University of Washington, Box 355065, Seattle, WA 98195-5065 USA
-
 PS: please do not use  j...@gs.washington.edu, which is an alias
 that some mail systems now mistake as indicating spam.

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/


Re: [R-sig-phylo] phylogenetic correlation analysis

2021-06-13 Thread Joe Felsenstein
Whether one gets them from PGLS directly or from contrasts, one can
get correlations by just inferring the covariance matrix and then
calculating
r(x,y) =  Cov(x,y) / (Var(x) Var(y))^(1/2)
where of course Var(x) is also  Cov(x,x), and so on.
You would not need a separate run to get correlations.

Joe
--
Joe Felsenstein felse...@gmail.com,  j...@gs.washington.edu
 Department of Genome Sciences and Department of Biology,
 University of Washington, Box 355065, Seattle, WA 98195-5065 USA

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/


Re: [R-sig-phylo] If my trait X cannot be regressed by body size, how can I rescue residuals corrected by the phylogeny and SE?

2021-05-27 Thread Joe Felsenstein
Gabriel Ferreira explained:


> I will try to better explain my problem, and I really appreciate your time
> to help me with this issue.
> My study is a conventional ecomorph with linear and univariate
> measurements
>

> So... Some of my traits are linear measures that can and must be
> "corrected" by body size, such as tail length. I usually conduct such body
> size corrections with phylogenetic regressions using *gls *func. from
> *nlme*:
>

 lots of details ...


> So I could use the residuals of this regression in the phy ANOVAs ... Does
> it make sense?
>


Well, I am afraid I am lost.  Perhaps someone else here could explain the
issues to me ...


Joe
--
Joe Felsenstein felse...@gmail.com,  j...@gs.washington.edu
 Department of Genome Sciences and Department of Biology,
 University of Washington, Box 355065, Seattle, WA 98195-5065 USA

[[alternative HTML version deleted]]

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/


Re: [R-sig-phylo] If my trait X cannot be regressed by body size, how can I rescue residuals corrected by the phylogeny and SE?

2021-05-27 Thread Joe Felsenstein
If wants residuals of values of a trait in each species, taking into
account within-species variation and phylogeny, what does it mean if those
residuals correlate with those of some other character, or with an
environmental variable?

Just asking which R machinery to use might wait until it is clear what the
intended task is and why that makes sense.

Joe
--
Joe Felsenstein felse...@gmail.com,  j...@gs.washington.edu
 Department of Genome Sciences and Department of Biology,
 University of Washington, Box 355065, Seattle, WA 98195-5065 USA

[[alternative HTML version deleted]]

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/


Re: [R-sig-phylo] units of sigsq

2021-03-19 Thread Joe Felsenstein
Marguerite asked:

> First - Joe - what do you mean by log(grams) has no units? The units of grams 
> is a unit, so log(mass) will have units of log-gm.  As log is not the same as 
> 1/gm, log(gm) cannot be unit-free.

I looked it up on Wikipedia, and was assured by it that Marguerite is
right, log(weight) has the same units as weight, except you're
supposed to call them log-gm.


I do think that unless there is a calibration with time, the tree
branch lengths are not in units of time but in units of base
substitutions per site.

And I remain confused on what the units are, if you compute a linear
combination such as  2 log(wt) - 3 log(height). Which princip[al, le]
components machinery does.

Joe
--
Joe Felsenstein felse...@gmail.com,  j...@gs.washington.edu
 Department of Genome Sciences and Department of Biology,
 University of Washington, Box 355065, Seattle, WA 98195-5065 USA

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/


Re: [R-sig-phylo] units of sigsq

2021-03-19 Thread Joe Felsenstein
Ted Garland asked:

OK, Joe, that's for one trait at a time.
> Would you please continue your discourse, but extend to multiple traits
> and their covariances
>

OK, assuming that’s not a joke which it seems it was.  If all characters
are log of something, their variances all have units of sites squared per
square substitution.


But if you have different units in different characters each variance would
have units. (CharXunit) squared times sites-squared per square
substitution.  A covariance would be (CharXunit times CharYunit) times
sites squared per square substitution.

If you had a principal component (usually misnamed a “principle” component)
it is in terms of a linear combination of characters, and I am deeply
puzzled how to give their units as they mix them.

Joe

-- 

Joe Felsenstein felse...@gmail.com,  j...@gs.washington.edu
 Department of Genome Sciences and Department of Biology,
 University of Washington, Box 355065, Seattle, WA 98195-5065 USA

[[alternative HTML version deleted]]

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/


Re: [R-sig-phylo] units of sigsq

2021-03-19 Thread Joe Felsenstein
Folks --

If the trait is  log(grams)  then the trait is unit-free.   The "time"
is probably branch length from a phylogeny.  That in turn (from DNA
data) is usually   DNA substitutions per site.

So the units of the standard deviation aresites per substitution.
But this is not the standard deviation, it is its square.

So (wait for it ...)square sites per square substitution

Now that is pretty weird.  But years ago people pointed out to me that
quantitative geneticists were accustomed to inferring variance
components of crop yield.  The yield might be in  bushels per acre.
So the units of its variance was:  square bushels per square acre.
Don't even try to think about how you square a bushel, or how many
dimensions you have to go into to square an acre.   Actually, you can
think about them: a bushel is three-dimensional volume, and an acre is
two dimensional area.  So crop yield has units of  meters, and
variance of crop yield should have units of  square meters.

That way lies madness ...

Joe
-
Joe Felsenstein felse...@gmail.com,  j...@gs.washington.edu
 Department of Genome Sciences and Department of Biology,
 University of Washington, Box 355065, Seattle, WA 98195-5065 USA

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/


Re: [R-sig-phylo] collapsing sets of nodes based on label values

2020-06-02 Thread Joe Felsenstein
Karla Shikev --

> I just want to take a tree with node labels (e.g.
> bootstrap values from a RAxML analysis) and collapse all nodes with support
> values below, say, 90%.


People here have given you various ways of doing this in R.  I am not
very familiar with the R packages, but just wanted to add one more
way.  If you have the original set of trees that were used to get the
bootstrap values, you can do a version of the bootstrap that makes a
tree which shows only the groups that are present in 90% or more of
the input trees.  This is called an M(0.90) consensus tree (the 0.90
is in a subscript which I can't typeset in an email).  It is available
in the program Consense in my PHYLIP package, and also in some other
phylogeny packages.   If you prefer R you can use Liam Revell's
Rphylip package, together with installing a copy of PHYLIP, as Rphylip
serves as a front-end for PHYLIP within R.

J.F.
-
Joe Felsenstein j...@gs.washington.edu
 Department of Genome Sciences and Department of Biology,
 University of Washington, Box 355065, Seattle, WA 98195-5065 USA

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/


Re: [R-sig-phylo] Multivariate ASR with Discrete Characters

2020-03-04 Thread Joe Felsenstein
William Gearty --

>
> Thank you Don and Julien for your suggestions. I was able to run threshml
> using Rthreshml from the Rphylip package. However, now I'm not really sure
> what to do with the results. Can I use the output to perform ancestral
> state reconstruction/estimation for the continuous (3) and discrete (1)
> characters? I see that phytools has ancThresh, but that only seems to work
> with a single character.

I don't know what are the limitations of Rphylip's interface
with Threshml, but I can address Threshml itself -- as I
wrote it.

In principle the liabilities sampled at the interior
nodes of the tree could be used to do ancestral
reconstructions at those nodes, in a very straight-
forward way.  Just consider them, not the rest of
the tree, and make a histogram of them at the node.

However I never bothered to put in any interface
to do that.  However note that what Threshml does
is to transform the liabilities to independent variables,
using the current best estimate of the covariance
matrix of liabilities.  The Gibbs sampling (and at the
tips, the Metropolis/Hastings sampling) occurs in
the independent characters.  Then you have to
transform back to the liabilities before you have
samples for those at the interior nodes.

So matters are not simple, but in principle it can
be done.  Note that for continuous traits Threshml
can also be used to sample values at the interior
nodes.  Of course this is a lot more computation
than just using likelihood computations -- I've
checked and the sampling does infer the same
covariances in that case.  In this all-continuous
case the same issue of transforming to independent
characters also comes up.

Joe

Joe Felsenstein j...@gs.washington.edu
 Department of Genome Sciences and Department of Biology,
 University of Washington, Box 355065, Seattle, WA 98195-5065 USA

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/


Re: [R-sig-phylo] Phylogenetic varying slopes and intercepts model

2019-09-10 Thread Joe Felsenstein
Oscar Inostraza --

Is the variable  x also evolving on the tree?  If so you need to use
standard phylogenetically-informed comparative methods to estimate the
variances and covariances of changes in both characters.

You may not be able to assume that  y   responds instantly to  x.

J.F.

Joe Felsenstein, j...@gs.washington.edu
Department of Genome Sciences, University of Washington, Box 355065,
Seattle, WA. 98195-5065 USA

[[alternative HTML version deleted]]

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/


Re: [R-sig-phylo] Nantucket Phylogenetics DevelopeR Workshop

2019-08-08 Thread Joe Felsenstein
>
> We are very happy to announce the 2nd edition of a graduate-level
> workshop on phylogenetic method development R. The course will be four
> days in length and take place at the University of Massachusetts
> Boston's Nantucket Field Station from the 5th to the 8th of November,
> 2019.

Between this and the Workshop on Molecular Evolution (Woods Hole), the
Applied Phylogenetics Workshop (Bodega Bay, California), and the
Evolutionary Quantitative Genetics Workshop (Friday Harbor
Laboratories), there seems to be a major competition to see who can
have their course in the most picturesque marine laboratory.

Joe
----
Joe Felsenstein j...@gs.washington.edu
 Department of Genome Sciences and Department of Biology,
 University of Washington, Box 355065, Seattle, WA 98195-5065 USA

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/


Re: [R-sig-phylo] Link between Mahalanobis distance and PGLS

2019-01-04 Thread Joe Felsenstein
Guillaume Louvel asked:

>
> So my first question is: can we directly apply the Mahalanobis distance
> to measure a kind of "phylogeny-corrected" distance between 2 vectors of
> trait values for a list of species? Since we assume a brownian motion,
> we know these vectors should be drawn from a multivariate normal
> distribution with known covariance matrix. Therefore the Mahalanobis
> distance seems perfectly appropriate to me, is it the case?

It is appropriate.  In fact this is in effect what
regressing contrasts in trait Y on contrasts in
trait X is doing.  One can alternatively use a
multivariate regression appoach, which is what
Alan Grafen (1989) did, and the results are
the same either way (in the simplest cases).

Note that although the contrasts can be
treated as independent observations, that is
not true for the tip species values  -- the
Grafen "Phylogenetic Regression" does not
treat the tip values as independent, and for
the same reason pairwise distances between
tips are not independent.


>
> I don't want to do a statistical test per se, I am rather interested in
> ranking many traits according to their distance to a pattern of reference.

I am unclear about what that means.

>
> My second subsidiary question is: can I apply this Mahalanobis distance
> if my traits are binary (e.g. presence-absence of some sequence in the
> genomes). In that case I know that my trait is not multivariate normal,
> but considering that I have millions of traits, shouldn't I expect the
> whole set to have some normal characteristics?

Basically no.  Although people have approximated
binary traits by Gaussian variables (I think Paul
Harvey and Mark Pagel did in their 1991 book),
it is much more appropriate to use a threshold
model.  See my 2012 paper in American Naturalist
or the earlier 2005 sketch of the method in
Proc. Royal Society of London series B.

A  good paper agonizing about all this is:

Maddison WP & FitzJohn RG. 2015. The unsolved challenge to
phylogenetic correlation tests for categorical characters. Systematic
Biology 64: 127–136

though I'd say that the problem is not as "unsolved" as they think.



> Finally, if none of the approach above is justified, is there a
> multivariate phylogenetic method for discrete/binary traits? Some kind
> of adapted phylogenetic PCA ?

See above.  It does require MCMC, and cannot
simply be done with distances.

J.F.

Joe Felsenstein j...@gs.washington.edu
 Department of Genome Sciences and Department of Biology,
 University of Washington, Box 355065, Seattle, WA 98195-5065 USA

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/


Re: [R-sig-phylo] Extended Majority Consensus Topology (Allcompat Summary) in R? And some observations on ape's consensus() function

2018-10-27 Thread Joe Felsenstein
David Bapst asked:

>
> I was interested if anyone was familiar with R code that can estimate an
> extended majority consensus tree (referred to as an 'allcompat' tree by the
> sumt command in MrBayes)? This is a fully bifurcating summary of a tree
> posterior, where each clade is maximally resolved by the split that is most
> abundant in the considered post-burn-in posterior (i.e., that split which
> has the plurality, if not the majority - the highest posterior probability
> of any other competing, conflicting splits recovered within the posterior.
> So, I guess one could also call these plurality consensus trees...).

You can also get the Extended Majority Rule consensus tree from the
Rconsense function in Liam Revell's package Rphylip, which is calls
programs from my PHYLIP package.  Consense in PHYLIP does output, in
addition to the consensus tree itself, a list of partitions found in
the input trees, and the frequency of each.  Rconsense may be able to
do that too.

Speaking as the one who defined the EMR method, I need to make a
warning:  EMR makes a tree by ordering the partitions (splits) in
order of their frequency.  To make Margush and McMorris's Majority
Rule consensus tree one simply goes down this list and takes all the
partitions that occur more than 50% of the time.  EMR continues
further, in order to resolve the tree fully.  It keep accepting
partitions as long as they don't conflict with anything already
accepted.

But this means that two partitions could be found (say) 45% of the
time each.   They could both be compatible with the partitions in the
majority-rule tree, but in conflict with each other.  Which then gets
included then depends only on which one is encountered first as one
goes down the list of most-frequent partitions.  And that will just be
a matter of things like the order in which the tree containing them
occurs among the input trees.  That is one of the limitations of the
EMR method.

Note also that EMR is subtly different from finding the largest set of
split (partitions) that are all mutually compatible.  That will often
be the same, but not always.  The latter is called the Nelson
Consensus Tree.

Joe

Joe Felsenstein j...@gs.washington.edu
 Department of Genome Sciences and Department of Biology,
 University of Washington, Box 355065, Seattle, WA 98195-5065 USA

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/


Re: [R-sig-phylo] Estimating marginal ancestral states under a non-reversible model of evolution.

2018-09-18 Thread Joe Felsenstein
Jordan Gault asked:

> I would like to do a worked example for an unequal-rates model as well but my 
> understanding is that the re-rooting method is inappropriate for 
> non-reversible models of trait evolution. How are marginal ancestral states 
> estimated for non-reversible models of trait evolution?

Why do "unequal rates" make the model nonreversible?

I assume that by unequal rates you mean not
rates different in different sites, but forward and
backward rates different at a single site.

If that is the case, the equilibrium frequencies
of the two states become correspondingly
different, and the model is still reversible.  The
tree can be rerooted at any interior node
and the marginal ancestral states at that node
found by the usual likelihood "pruning"
algorithm.

J.F.

Joe Felsenstein j...@gs.washington.edu
 Department of Genome Sciences and Department of Biology,
 University of Washington, Box 355065, Seattle, WA 98195-5065 USA

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/


Re: [R-sig-phylo] estimate ancestral state with OUwie models

2018-06-12 Thread Joe Felsenstein
Let me add more warnings to Marguerite and Thomas's excellent
responses.   People may be tempted to infer ancestral states and then
treat those inferences as data (and also to infer ancestral
environments and then treat those inferences as data).  In fact, I
wonder whether that is not the main use people make of these
inferences.

But not only are those inferences very noisy, they are correlated with
each other.  So if you infer the ancestral state for the clade (Old
World Monkeys, Apes) and also the ancestral state for the clade (New
World Monkeys, (Old World Monkeys, Apes)) the two will typically not
only be error-prone, but will also typically be subject to strongly
correlated errors.  Using them as data for further inferences is very
dubious.  It is better to figure out what your hypothesis is and then
test it on the data from the tips of the tree, without the
intermediate step of taking ancestral state inferences as
observations.

The popular science press in particular demands a fly-on-the-wall
account of what happened in evolution, and giving them the ancestral
state inferences as if they were known precisely is a mistake.

Joe

Joe Felsenstein j...@gs.washington.edu
 Department of Genome Sciences and Department of Biology,
 University of Washington, Box 355065, Seattle, WA 98195-5065 USA

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/


Re: [R-sig-phylo] Detection association between traits

2018-03-28 Thread Joe Felsenstein
Rafa Braga asked

>
> Does anybody know how to detect association between two traits while removing 
> the influence of the third one? I have three traits which are named a b and 
> c. I'm wanting to test the correlation of a and b while controling c as a 
> covariate. In other words, the correlation between trait a and b can be a 
> result of the correlation between b and c and I'm wanting to remove the 
> influence of c on b.


This goes all the way back 100 years.

Once you have inferred the variances and covariances of
the three traits, you can compute the partial correlation
between  a  and  b.

This is defined as the correlation between the residual
of  a  when predicted by  c,  and the residual of  b  when
predicted by  c.

When there is only one variable  c,   and when
r(a,b)  is the correlation of  a  with  b, then the
partial correlation is

[ r(a,b) - r(a,c) r(b,c)] / [(sqrt(1 - r(a,c)^2) sqrt(1 - r(b,c)^2)]

For likelihood ratio testing in a linear model,
one could compare the likelihoods of models
that did or did not have direct connection of
a  and  b.  RA Fisher also derived a distribution
of the partial correlation coefficient.


Joe

Joe Felsenstein j...@gs.washington.edu
 Department of Genome Sciences and Department of Biology,
 University of Washington, Box 355065, Seattle, WA 98195-5065 USA


>
>
>
> Yours,
> Rafa
>
> [[alternative HTML version deleted]]
>
> ___
> R-sig-phylo mailing list - R-sig-phylo@r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
> Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/



-- 

Joe Felsenstein j...@gs.washington.edu
 Department of Genome Sciences and Department of Biology,
 University of Washington, Box 355065, Seattle, WA 98195-5065 USA

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/


Re: [R-sig-phylo] Phylogenetic PCA and measurement error

2018-03-12 Thread Joe Felsenstein
Rafael and Liam --

>
> So far as I know, there is currently no way to explicitly take into account
> sampling error in computing principal components while also accounting for
> the phylogeny. However, it is relatively straightforward to compute scores
> for individuals from a PCA conducted on species means.
>
> This would look as follows (in which Xm is a matrix containing values for
> species for each trait, and Xi is a matrix with the same number of columns
> but containing values for individuals):
>
> pca<-phyl.pca(tree,Xm)
> Si<-Xi%*%pca$Evec
>
> Then, if you have a separate vector containing species ID as a factor, you
> could compute means and variances for each component by species.


There are methods (not all implemented in R) for taking into
account the within-species phenotypic covariation among
individuals and also the evolutionary covariances between
species (on a phylogeny).  These include the method of
Ives, Midford, and Garland (2007) and my own method
(2008).  The former assumes you know the within-species
covariance matrix, the latter estimates it from the sampled
individuals for each species.  Both of these assume that the
true within-species phenotypic covariance matrices are the
same for all species.

For my method, you can use Liam's package Rphylip to
call my program Contrast if you also have PHYLIP installed.

Once you infer these covariance matrices, those are the
relevant sufficient statistics (if the distributions are
multivariate normal).  The PCA axes for either covariance
matrix can then be computed from those, or the
canonical variates axes, which are the principal components
of the between-species covariation relative to the
within-species covariation.

You don't need to use the PCA machinery until after you
estimate these two covariance matrices.

Joe

Joe Felsenstein j...@gs.washington.edu
 Department of Genome Sciences and Department of Biology,
 University of Washington, Box 355065, Seattle, WA 98195-5065 USA

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/


Re: [R-sig-phylo] phylogenetic signal with sample sizes but no standard errors

2017-08-04 Thread Joe Felsenstein
There is also my C program Contrast, which implements a method from a
2008 paper I wrote:

Felsenstein, J.  2008.  Comparative methods with sampling error and
within-species variation: contrasts revisited and revised. American
Naturalist 171: 713-725.

This estimates the within-species covariances and the between-species
evolutionary variances.It is not an R program but can be accessed
through Liam Revell's package Rphylip, if you also have my (non-R)
package PHYLIP installed.

A pretty good (but not ML) estimate of the within-species phenotypic
variance can be gotten by pooling the within-species sampling error.
The harder part is using that to correct one's estimate of the
covariances of the between-species change, which using ordinary
methods will have some within-species variation mixed in.

Joe

Joe Felsenstein j...@gs.washington.edu
 Department of Genome Sciences and Department of Biology,
 University of Washington, Box 355065, Seattle, WA 98195-5065 USA

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/


Re: [R-sig-phylo] phylip file compression ratios

2017-04-18 Thread Joe Felsenstein
Jacob Berv noted:

>
> I noticed today that the compression ratio for an interleaved phylip file 
> (zip compressed) was about 84:1, (390MB uncompressed —> 4.6MB compressed) 
> whereas the compression ratio for the same data non-interleaved was a much 
> worse 3.4:1 (390 MB uncompressed —> 113.9 MB). Not knowing much about how zip 
> compression actually works - I thought this might be an interesting 
> observation for the group…


Interleaved sequences have blocks of (say) 50 bases.  Successive lines
may repeat a whole block or nearly repeat it.  I wonder whether that
makes the interleaved format easier to compress.

I would guess that the compressibility of interleaved sequences would
be highest when the sequences are closely related.  In that case there
would be 50-base blocks of nearly identical sequences.  With less
closely related sequences the compressibility should be much lower.

Joe

Joe Felsenstein j...@gs.washington.edu
 Department of Genome Sciences and Department of Biology,
 University of Washington, Box 355065, Seattle, WA 98195-5065 USA

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/

Re: [R-sig-phylo] Average Aminoacid Identity tree with bootstrap support. Is it possible?

2017-04-07 Thread Joe Felsenstein
Salvador Espada Hinojosa --

The problem, of course, is that a random substitution on an interior
branch of a tree increases or decreases the size of more than one
distance in the distance matrix.  The distances aren't independent in
their statistical noise.  So you can't just sample distances after the
distance matrix is computed.

J.F.

Joe Felsenstein j...@gs.washington.edu
 Department of Genome Sciences and Department of Biology,
 University of Washington, Box 355065, Seattle, WA 98195-5065 USA

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/


Re: [R-sig-phylo] HKY GTR distances

2017-02-03 Thread Joe Felsenstein
Emmanuel wrote:

>
> There is no distance formula for HKY or GTR model. For GTR, Rodrı́guez et
> al. developed a procedure to calculate a distance (also in Yang's 2006
> book). An example is given below with the woodmouse:
>
> matlog <- function(x) {
> tmp <- eigen(X)
> V <- tmp$vectors
> U <- diag(log(tmp$values))
> V %*% U %*% solve(V)
> }
>
> tr <- function(x) sum(diag(x))
>
> data(woodmouse)
>
> PI <- diag(base.freq(woodmouse[1:2, ]))
> Ft <- Ftab(woodmouse[1:2, ])
> F <- Ft/sum(Ft)
> X <- solve(PI) %*% F
> -tr(PI %*% matlog(X))
>
> You have to call this code for each pair of sequences (or wrap it in a
> function).
>
> For HKY, Yang mentioned a procedure Rzhetsky & Nei (1994, J Mol Evol).
>

Strictly speaking there *is* a formula for GTR, which I think may be
equivalent to the above.  The formula is given in my book "Inferring
Phylogenies" on page 209 as equation 13.24.  Note that for both of
these, the equilibrium frequencies of the bases are estimated from the
empirical frequencies in the two sequences.  That means that for each
distance, the equilibrium frequencies are somewhat different, as
different sequences are being used to estimate the base frequencies.
We are not, in the use of these formulas, estimating one overall set
of equilibrium frequencies and then using that for all the the
distances in our data set.

The Rzhetsky-Nei 1994 distance formula is, I believe, an
approximation, though probably a very good one.  It is not quite the
same as the estimate you would get by maximizing the likelihood.

Distance formulas that involve empirically estimated base frequencies,
which all of these do, have in common the problem that one either
estimates those separately for each pair of sequences, or jointly
estimates them from the whole dataset, without using a tree in the
process.

J.F.

Joe Felsenstein j...@gs.washington.edu
 Department of Genome Sciences and Department of Biology,
 University of Washington, Box 355065, Seattle, WA 98195-5065 USA

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/

[R-sig-phylo] Fwd: Workshop: UWashington.EvolQuantGenetics.Jun5-9

2017-01-31 Thread Joe Felsenstein
Evolutionary Quantitative Genetics Workshop

Friday Harbor Laboratories, University of Washington, 5-9 June 2017


Non-credit workshop.
Participants arrive at Friday Harbor Labs on Sunday, June 4, lectures and
exercises occur June 5-9, and participants depart on Saturday, June 10, 2017.

Application deadline March 1, 2017.
Application forms and details here:
  http://tinyurl.com/EQG2017
Web page:
  http://depts.washington.edu/fhl/studentSummer2017.html#SumB-genetics

Instructors:

Dr. Joe Felsenstein
Department of Genome Sciences
University of Washington, Seattle
j...@gs.washington.edu

Dr. Stevan J. Arnold
Department of Integrative Biology
Oregon State University, Corvallis
stevan.arn...@oregonstate.edu


Previous versions of this 5-day workshop were given at the National
Evolutionary Synthesis Center (NESCENT) at Duke University in Durham, North
Carolina from 2011-2013, and then at the National Institute for Mathematical
and Biological Synthesis (NIMBioS) at the University of Tennessee in Knoxville
from 2014-2016. Like past versions, the Friday Harbor version will review the
basics of theory in the field of evolutionary quantitative genetics and its
connections to evolution observed at various time scales. The aim of the
workshop is to build a bridge between the traditionally separate disciplines of
quantitative genetics and comparative methods.

Quantitative genetic theory for natural populations was developed considerably
in the period from 1970 to 1990 and up to the present, and it has been applied
to a wide range of phenomena including the evolution of differences between the
sexes, sexual preferences, life history traits, plasticity of traits, as well
as the evolution of body size and other morphological measurements. Textbooks
have not kept pace with these developments, and currently few universities
offer courses in this subject aimed at evolutionary biologists. Evolutionary
biologists need to understand this field because of the ability to collect
large amounts of data by computer, the development of statistical methods for
changes of traits on evolutionary trees and for changes in a single species
through time, and the realization that quantitative characters will not soon be
fully explained by genomics. This workshop aims to fill this need by reviewing
basic aspects of theory and illustrating how that theory can be tested with
data, both from single species and with multiple-species phylogenies.
Participants will use R, an open-source statistical programming language, to
build and test evolutionary models.

The workshop involves lectures and in-class computer exercises. You can
consult the 2016 tutorial website for examples, it will be found at
  http://www.nimbios.org/tutorials/TT_eqg2016
The intended participants for this
tutorial are graduate students, post-docs, and junior faculty members in
evolutionary biology. The workshop can accommodate up to 35 participants. Guest
instructors will include:

* Marguerite Butler, Biology, Univ. Hawai'i, Mānoa
* Patrick Carter, Evolutionary Physiology, Washington State University, Pullman
* Adam Jones, Biology, Texas A University, College Station
* Brian O'Meara, Ecology & Evolutionary Biology, Univ. of Tennessee, Knoxville
* Josef Uyeda, Biological Sciences, Virginia Tech, Blacksburg

Cost: $1000 to be paid to Friday Harbor Laboratories. This fee will cover
housing and meals at FHL and all other workshop expenses, except travel.
Participants who have been admitted to attend will make their payment prior to
arrival at FHL. Details of payment by credit card or check will be provided
once the applicant has been admitted to attend.


J.F.
----
Joe Felsenstein j...@gs.washington.edu
 Department of Genome Sciences and Department of Biology,
 University of Washington, Box 355065, Seattle, WA 98195-5065 USA

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/

Re: [R-sig-phylo] simulating continuous data

2016-05-10 Thread Joe Felsenstein
Bryan McLean --


> I’m working to simulate multiple continuous characters on a known
> phylogeny (using several of the standard models), and I want to compare
> properties of the simulated datasets to an empirical dataset. My question
> is: what is the standard method for ensuring that those datasets
> (simulated, empirical) are actually directly comparable, i.e. scaled
> similarly? Does this involve specifying a sensible root state (e.g.
> ancestral reconstruction) OR just rescaling one or the other datasets
> before or after the analysis? Forgive me if this is a bit of a naive
> question, just trying to get a sense of standard practices.
>

It would seem to depend on what you consider to be "scaled similarly".

When I simulate multiple characters with correlated Brownian Motion, I
specify a covariance matrix for the evolutionary changes, as well as a
starting vector of means.  Using a matrix square root of the covariance
matrix, one can transform the characters so that the covariance matrix of
the new characters is an identity matrix.  Those are easy to simulate up
the tree, and then one transforms back to the original characters.

I do this with my own C programs, but it can be done in R too.

But what does it mean to be "scaled similarly" ?

Joe

Joe Felsenstein j...@gs.washington.edu
 Department of Genome Sciences and Department of Biology,
 University of Washington, Box 355065, Seattle, WA 98195-5065 USA

[[alternative HTML version deleted]]

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/

Re: [R-sig-phylo] BiSSE/MacroCAIC on Non-Ultrametric Tree with Polytomies?

2016-04-14 Thread Joe Felsenstein
Ted commented:


> Nice shot across the bow!
>

Well, thanks but I have been shooting and shooting and shooting, with
regard to issues like regressing one character on another and then
analyzing only the residuals phylogenetically, and in this case issues like
not thinking about how the environments changed along the tree.   Not much
effect so far so you can expect more shots across the bow.

Joe
----
Joe Felsenstein j...@gs.washington.edu
 Department of Genome Sciences and Department of Biology,
 University of Washington, Box 355065, Seattle, WA 98195-5065 USA

[[alternative HTML version deleted]]

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/


Re: [R-sig-phylo] BiSSE/MacroCAIC on Non-Ultrametric Tree with Polytomies?

2016-04-14 Thread Joe Felsenstein
Brian Gill --

>
U between a discrete binary predictor
> (Latitude: Colorado/Ecuador) and a continuous response (species richness).
>
> e at the influence of a discrete
> binary trait on richness, but I'm not sure if my tree is suitable or if
> there is a better approach.
>
> 1) Diversitree package BiSSE
> 2) Caper package using MacroCAIC
>
> Any suggestions would be greatly appreciated-

And what do these methods assume about how that "discrete binary predictor"
evolves along the tree?

Joe
-
j...@gs.washington.edu
Joe Felsenstein, Department of Genome Sciences and Department of Biology
Box 355065, University of Washington, Seattle, WA  98195-5065

[[alternative HTML version deleted]]

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/


Re: [R-sig-phylo] Testing for relationship between one categorical and one continuous variable in a phylogenetic framework.

2016-04-08 Thread Joe Felsenstein
Sean --

... or, if you want to do it *really correctly*, you can use the threshold
model of Sewall Wright for the discrete character and use the MCMC approach
that I proposed in 2012:

Felsenstein, J.  2012.  A comparative method for both discrete and
continuous characters using the threshold model. American Naturalist 179:
145-156.

which is implemented in my program Threshml which can be called from Liam
Revell's Phytools R package.  It also works for multiple threshold
characters and multiple continuous characters.

Joe

Joe Felsenstein j...@gs.washington.edu
 Department of Genome Sciences and Department of Biology,
 University of Washington, Box 355065, Seattle, WA 98195-5065 USA

[[alternative HTML version deleted]]

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/


Re: [R-sig-phylo] Determining Order of Trait Evolution

2016-04-04 Thread Joe Felsenstein
Gavin Leighton asked:

> >
> > > I have 500 trees of 80 species downloaded from birdtree.org and am
> > primarily interested in two traits. I have used PGLS to determine the
> > traits are related but would ideally like to test if there is an order to
> > trait evolution. To complicate matters one trait (Trait A) is continuous
> > while the second (Trait B) is presence/absence. I was hoping someone
> could
> > direct me to methods (assuming they exist) that would allow me to
> determine
> > the estimated value of Trait A before a gain of Trait B evolves in a
> > lineage.
> > >
>

A superior model (if I do say so myself) is the threshold model of Sewall
Wright (1934).  I have described in a paper in American Naturalist in 2012
how to use it to model discrete traits on a phylogeny.  It allows the case
of one trait continuous and another discrete.

It is implemented in my program Threshml, which should be callable from
Liam Revell's "phytools" R package.

However it does not precisely answer the question you posed, but just asks
whether the two traits evolve in a correlated fashion.

J.F.

Joe Felsenstein j...@gs.washington.edu
 Department of Genome Sciences and Department of Biology,
 University of Washington, Box 355065, Seattle, WA 98195-5065 USA

[[alternative HTML version deleted]]

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/


Re: [R-sig-phylo] [MORPHMET] model II regression statistics PAST

2016-03-18 Thread Joe Felsenstein

A few days ago, Jim Rohlf said, in response to Patrick Arnold's query about how 
to test the significance of major axis "regression"

>
> I don't know about how to do it in PAST, but note that because the slope of 
> the standard major axis "regression" line is just the signed ratio of two 
> standard deviations, its square follows the F-distribution if one assumes 
> that the two variances are based on random samples from two normally 
> distributed populations. Thus a test for a slope = 1 corresponds to a 
> 2-tailed test for the equality of variances.  One can easily look up 
> probabilities in an F-table or compute confidence limits using standard 
> methods. Note that if the samples are across different populations or 
> species, as in many (most?) morphometric applications, then these assumptions 
> do not hold.


This problem was brought to my attention by Paul Harvey in the late 1970s.
I suggested that he look for a rotation angle (theta) that would maximize the
likelihood under a model in which the two (new) variables are independent
Gaussian variables.  This also allows a likelihood ratio test of the assertion
that theta is zero.  The estimate of theta provides an estimate of the angle of
the major axis.  These can be easily generalized to multiple populations, even
when their variances are unequal.

The likelihood ratio test is done with a test for equality of correlation
matrices which will be found in a multivariate statistics book by Morrison.

I am not sure where Paul published this, but I think he did in an appendix to a
paper of his in some multiauthor volume.

J.F.

Joe Felsenstein j...@gs.washington.edu
 Department of Genome Sciences and Department of Biology,
 University of Washington, Box 355065, Seattle, WA 98195-5065 USA

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/


Re: [R-sig-phylo] summary stats for comparative methods p-values

2016-03-10 Thread Joe Felsenstein
If the 100 trees are trees sampled from a Bayesian posterior, or else trees
from bootstrap samples of your data, then you might just take the estimates
from each tree (say estimates of a regression coefficient).  Consider their
distribution and ask whether the null hypothesis value (such as having
slope 0) is in the tail of that histogram.

The overall P value will be the proportion of estimated slopes that are
below zero (unless you want to do a two-tailed test).

Under the null hypothesis this will have the proper rejection probability.
As you take more and more trees the power increases some, but the type I
error rate stays the same.

If the 100 trees are something else, such as the personal opinions of 100
of your friends, then there is no statistical justification for this.

J.F.

Joe Felsenstein j...@gs.washington.edu
 Department of Genome Sciences and Department of Biology,
 University of Washington, Box 355065, Seattle, WA 98195-5065 USA

[[alternative HTML version deleted]]

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/


Re: [R-sig-phylo] Pairwise Distances

2016-02-18 Thread Joe Felsenstein
Pedro --

> Well, for now I just want to estimate the raw distances. But thinking
about coalescents is certainly interesting. Would you have a suggestion for
either ways of thinking?

Well, you must be wanting to use those raw distances to infer something.
Rates of migration?

There are extensive discussions in the book by John Wakeley.  Also in the
elementary population genetics text by Nielsen and Slatkin.

J.F.
----
Joe Felsenstein j...@gs.washington.edu
 Department of Genome Sciences and Department of Biology,
 University of Washington, Box 355065, Seattle, WA 98195-5065 USA

[[alternative HTML version deleted]]

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/


Re: [R-sig-phylo] Pairwise Distances

2016-02-18 Thread Joe Felsenstein
Pedro Taucce --


> Is there a way to estimate pairwise distances between and within groups of
> sequences (in my case, each group is a species with lots of individuals)? I
> used to do it with MEGA, but now I use Linux only and MEGA isn't getting
> along with it.
>
> The closest way I figured out is the function dist.dna from the ape
> package. But I think it does not estimate distances between groups.
>

You want to use distances between groups?  But you don't want to think
about coalescents?

J.F.

Joe Felsenstein j...@gs.washington.edu
 Department of Genome Sciences and Department of Biology,
 University of Washington, Box 355065, Seattle, WA 98195-5065 USA

[[alternative HTML version deleted]]

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/


Re: [R-sig-phylo] Accounting for phylogeny in binary predictor, binary response data

2016-02-10 Thread Joe Felsenstein
I do not know offhand whether there is an R implementation, but how about
Mark Pagel's 1994 method for testing whether two 0/1 characters changing
along a ohylogeny are changing independently?

J.F.
-
j...@gs.washington.edu
Joe Felsenstein, Department of Genome Sciences and Department of Biology
Box 355065, University of Washington, Seattle, WA  98195-5065

[[alternative HTML version deleted]]

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/


Re: [R-sig-phylo] Accounting for phylogeny in binary predictor, binary response data

2016-02-10 Thread Joe Felsenstein
I have received phishing spam through the R-sig-phylo mailing list
(pretending I had expressed interest in renting something from them) from

anya_phill...@casetotours.xyz

disguised as a reply to my comment.  So that address should immediately be
removed from the list.

Joe
-
j...@gs.washington.edu
Joe Felsenstein, Department of Genome Sciences and Department of Biology
Box 355065, University of Washington, Seattle, WA  98195-5065

[[alternative HTML version deleted]]

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/


Re: [R-sig-phylo] testing for variation in rates of evolution among traits

2015-07-17 Thread Joe Felsenstein
Warning: You'd have to ensure that the traits for which you are comparing
rates are evolving independently, so that they do not covary in their
evolutionary changes.

I assume Dean Adams's paper involves some way of coping with this.  The
issue of log-transforms that Ted raised is very important, otherwise big
measurements will tend have higher rates of evolution.

Joe

j...@gs.washington.edu

[[alternative HTML version deleted]]

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/


Re: [R-sig-phylo] [MORPHMET] Re: Stability of p-values (physignal and testing for morphological integration)

2015-06-08 Thread Joe Felsenstein
Alberto Gallano asked:


 Would this also be the case in the situation where n is small enough (~ 15)
 to enumerate all possible unique permutations? I was under the impression
 that such an 'exact' test provided the true p-value without error.

In a case that small, one might be able to evaluate all permutations.
(There are 1.3 x 10^12 permutations, but maybe you need combinations,
which are fewer).

Yes, you would then have an exact P value.  But of course that is not
the same as an infinitely powerful test -- after all an ordinary
t-test gets you an exact P value when its assumptions are met.

J.F.

Joe Felsenstein j...@gs.washington.edu
 Department of Genome Sciences and Department of Biology,
 University of Washington, Box 355065, Seattle, WA 98195-5065 USA

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/


Re: [R-sig-phylo] [MORPHMET] Re: Stability of p-values (physignal and testing for morphological integration)

2015-06-08 Thread Joe Felsenstein
A number of people have suggested that P values should stabilize after a
number of samples (in a permutation test) that depends on the data set.

I suspect that these were unintended misstatements.  As Dennis Slice has
mentioned, one can regard each permutation in the permutation test as a
random sample from a distribution.  Comparing a test statistic X to its
value in the data (say, Y), each permutation draws from a distribution in
which there is a probability P that X exceeds Y.

So each permutation is (to good approximation) a coin toss with probability
P of Heads.  There obviously no number of tosses beyond which the fraction
of Heads stabilizes.  The fraction of heads after N tosses will depart
from the true value P by an amount which has expectation 0, and variance
P(1-P)/N.  This is a fairly slow approach of the fraction of Heads to the
true value.

So to get twice as close to the true P value, one needs 4 times as many
permutations.  And this need for more and more samples continues
indefinitely.  There is no sudden change as one reaches a threshold number
of permutations.

But that's what you really meant, right?

Joe
---
Joe Felsenstein  j...@gs.washington.edu

[[alternative HTML version deleted]]

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/


Re: [R-sig-phylo] ancestor vs. change plots

2015-04-24 Thread Joe Felsenstein
Milton Tan asked:


 I have a question that is perhaps esoteric, since it's on a method I don't 
 see used often. I am looking at the dynamics of body size evolution, and have 
 come upon ancestor-vs-change plots described in Alroy 2000 (Understanding 
 the dynamics of evolutionary trends, Paleobiology). This is interesting 
 because it will allow me to see if rate of body size change depends on body 
 size. I haven't seen this method widely used, so for anyone unaware how this 
 works: for each branch, you plot the ancestral state vs. the amount/rate of 
 change along the branch.

If the question were just whether the rate of change of the character
(body size) depended on its value, then another way would be to look
for a transformation of body size that made the instantaneous variance
of change constant.  i.e., does change in  log(size)  or in
sqrt(size)  have constant variance?  There are parameterized
transformation such as   y = (x^p - x^{-p})/(2p)  or elsey = ( x^p
- 1) / p  + 1  for which you could estimate the parameter  p  by ML.

Joe

Joe Felsenstein j...@gs.washington.edu
 Department of Genome Sciences and Department of Biology,
 University of Washington, Box 355065, Seattle, WA 98195-5065 USA

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/


Re: [R-sig-phylo] Sister-clade analysis of discrete characters

2015-04-16 Thread Joe Felsenstein
I mentioned earlier the alternative of having two-state discrete
characters each modeled by an underlying quantitative character with a
developmental threshold.

This can in principle be extended to three or more states.  In fact,
Sewall Wright's original example in 1934 had three states.  But there
are two difficulties:

1. How far away from each other the two thresholds would be becomes
something that needs to be estimated.  If there are only two states,
then since the underlying scale is not directly observed, one can
place the threshold anywhere on the underlying liability scale, most
conveniently at zero.  But how far from that the second, third,
(etc.), thresholds should be placed then needs estimation as
parameters of the model.

2. It is also not obvious whether they third state uses the same
underlying scale.  For three states, one could have two underlying
liability scales, with each state determined by some region in this
two-dimensional space.  Inferring that would be a continuous-state
counterpart to Mary Mickevich's transformation series analysis of
1982 (see my book, page 81 for references).

So thus far the threshold model implementations do not allow three or
more states.  But the possibilty should be mentioned here.


Joe Felsenstein j...@gs.washington.edu
 Department of Genome Sciences and Department of Biology,
 University of Washington, Box 355065, Seattle, WA 98195-5065 USA

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/


Re: [R-sig-phylo] Sister-clade analysis of discrete characters

2015-03-23 Thread Joe Felsenstein
This is not a sister-clade method but ...

If you are happy with a model in which there are underlying polygenic
quantitative characters, covarying ones whose evolution is modeled by
Brownian Motion model, and some two-state discrete characters arise by
imposing a developmental threshold on each quantitative character, you
might look at my 2012 paper in American Naturalist:
http://www.jstor.org/stable/10.1086/663681
The method is designed for use with a known phylogeny and infers the
evolutionary covariances of the underlying quantitative characters (the
liabilities).

My C program Threshml analysis this model by MCMC sampling.  It can, I
believe, be called by Liam Revell's R package  phytools.

This threshold model, a phylogenetic version of one which originated in
1934 with Sewall Wright, is mentioned in the Maddison and FitzJohn paper
that William Gearty cited.  It is just briefly mentioned by them, I gather
because the authors forgot about it until the last minute when writing the
review. Nevertheless it is getting increasing use.

Joe

Joe Felsenstein j...@gs.washington.edu
 Department of Genome Sciences and Department of Biology,
 University of Washington, Box 355065, Seattle, WA 98195-5065 USA

[[alternative HTML version deleted]]

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/


Re: [R-sig-phylo] multipage pdf of a huge tree

2015-03-05 Thread Joe Felsenstein
Jonas Eberle wrote:


 thanks a lot! I didn't knew splitplotTree yet. Great function! However, my
 tree has several thousands of tips (yes, it's a bit crazy but unfortunately
 necessary...) and I guess it's only possible to split it on two pages with
 splitplotTree. Or am I missing something?


It's not in R (unless available through Liam Revell's phytools package),
but in PHYLIP the tree-drawing programs Drawgram and Drawtree have the
capability of splitting a plot into a rectangular array of plots, and
putting these out onto separate files (not PDFs, but Postscript is
possible).

This was intended to help people make large posters using printers that can
only do a single page.

However ...

I do not see why this is necessary.  Most tree-drawing programs should be
able to write a file that has the large tree plotted on it.  If you don't
want to print the resulting tree on paper, it would then be possible to
view the tree in an application such as Adobe Acrobat Reader and zoom in on
it and see the tiny branches and their labels.  Making multiple plots for
one tree would probably confuse the matter.

Or is there something I am missing here?

J.F.

Joe Felsenstein j...@gs.washington.edu
 Department of Genome Sciences and Department of Biology,
 University of Washington, Box 355065, Seattle, WA 98195-5065 USA

[[alternative HTML version deleted]]

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/


Re: [R-sig-phylo] multipage pdf of a huge tree

2015-03-05 Thread Joe Felsenstein
Jonas Eberle --


 thank you for your answer! Actually, I used to make large size pdfs with a
 tiny font size in these cases. It is then even possible to print this
 single-page-pdf on multiple pages in Acrobat Reader. The problem was that
 my current tree is really huge - with about 23000 tips. This took me to the
 limits of this approach, since pdf size is limited to 200 inch by Adobe and
 there also seems to be a lower limit for font size in pdfs (at least during
 export form R?).

 I didn't knew that Drawtree is able to split the tree over multiple pages.
 I guess this is an option in the command line version? I will check that
 out. In the mean time I found a way to do the job in R (see my post before)
 that seems to produce reasonable results.


PHYLIP programs do not enter settings on the command-line.  They have a
menu.  The Drawgram and Drawtree programs have a choice that gives a
submenu in which multiple pages can be selected. This submenu is not
available in the GUI (Java front end) version of those programs, but is
available in the character-mode menu that appears if the program is run
from the command line.  It is the selection made by typing the character
 #.

J.F.

Joe Felsenstein j...@gs.washington.edu
 Department of Genome Sciences and Department of Biology,
 University of Washington, Box 355065, Seattle, WA 98195-5065 USA

[[alternative HTML version deleted]]

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/


Re: [R-sig-phylo] Phylogenetic analysis with sequencing error

2015-01-23 Thread Joe Felsenstein
George Shireff asked:


 I was wondering if there is a way to incorporate sequencing error in
 phylogenetic analysis, if I know there is sequencing error but it's
 randomly distributed.

 I imagine this could be as simple as setting the probability of the
 observed base at each tip to 0.99 rather than 1, with a probability of
 0.01/3 rather than 0 for each of the other tips (for a sequencing error of
 1%). I was hoping that something like pml (phangorn) can be modified to do
 that, but I haven't been able to figure it out myself yet.


How to do this in likelhood computations (and thus in either likelihood or
Bayesian methods) is explained in my book, on pages 255-266 in the
subsection of Chapter 16 called Handling ambiguity and error.  I think
that this is the first published treatment of that.

My colleagues Mary Kuhner and Jim McGill published a paper in 2014 on this,
with simulations showing that it helps to model the sequencing error:

Kuhner MK, McGill J
Correcting for sequencing error in maximum likelihood phylogeny inference.
G3 (Bethesda). 2014 Nov 4;4(12):2545-52. doi: 10.1534/g3.114.014365.

I believe that one major phylogeny program has very recently added code to
model this.  We have a version of Dnaml for PHYLIP that can do it, and hope
to release it soon.

As George Shireff saw, it is actually easy to do, and the computations are
not any slower.

J.F.

Joe Felsenstein j...@gs.washington.edu
 Department of Genome Sciences and Department of Biology,
 University of Washington, Box 355065, Seattle, WA 98195-5065 USA

[[alternative HTML version deleted]]

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/


Re: [R-sig-phylo] Phylogenetic analysis with sequencing error

2015-01-23 Thread Joe Felsenstein
Brian O'Meara added:


 Something like that should work. Unlike the case of ambiguity, where if you
 see, say, a Y at a site you'd give the probability of a C or a T at that
 site each 1, not 0.5 (Felsenstein's book has a good discussion of this) I
 think you're right that you'd want the probabilities to add up to one
 across bases: with sequencing error, the probability of it being A in the
 sequence given that it truly is A in the species is 99%. I poked around
 phangorn and didn't see an easy way to do this, but it should be possible
 (might require going into the C code to do this).


To emphasize what Brian said: the probabilities at a site, in this
computation, are *not* the probabilities of alternative outcomes.  They are
the probabilities of one outcome given four different underlying
situations.  Thus they don't have to add up to 1.  For the ambiguity case,
in my book I warned:  You may be tempted to make the quantities (1/4, 1/4.
1/4, 1/4), but you should resist this temptation.

For simple models of sequencing error the four numbers can add up to 1.
Can, but don't have to.  If an A is more likely to be misread than a G
(say), then you get numbers that don't add up to 1, and that's OK, you
should not worry about that, because they aren't the probabilities of four
different outcomes.  If we made up a 4 x 4 table where the first row shows
the probabilities of the four observations given that the base really is A,
and second row is the probabilities given that it really is G, etc., then
the four quantities you want at one site at one tip of the tree are the
four numbers in a column.  (Which column depends on what you observed).
The rows have to add to 1; the columns don't have to.

J.F.

Joe Felsenstein j...@gs.washington.edu
 Department of Genome Sciences and Department of Biology,
 University of Washington, Box 355065, Seattle, WA 98195-5065 USA

[[alternative HTML version deleted]]

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/


Re: [R-sig-phylo] hierarchical model with phylogenetic dependence term

2014-12-15 Thread Joe Felsenstein
This question was asked by Peter Smits and by Edwin Lebrija Trejos.  In
Peter Smit's question it was this:


 I have a similar question to Edwin. I too am working with a hierarchical
 Bayesian model, though I've implemented it in stan. I've included a
 phylogenetic random effect term, which is modeled as being distributed as
 multivariate normal with known covariance matrix up to a constant,
 sigma_{phy}. This follows Lynch '91 Am Nat and Housworth et al. '04 Am Nat
 by drawing on the similarity with the animal model from quantitative
 genetics.

 My question is about the scaling of the covariance matrix: is it necessary
 to have the the diagonal terms satisfy x = 1 and all the off diagonals to
 be 0 = x  1? I have a time scaled phylogeny from which I have my
 covariance matrix, so currently all elements of the matrix are not scaled
 so that the greatest distance from the root to a tip is 1. Currently, the
 elements of the matrix are just the sum of the shared branch lengths. Is
 this appropriate? Why or why not?


I hope people will correct me on this, but my take is:

1. To infer a variance term for the phylogenetic random term
one has to scale that term somehow, and then the variance
will specify how many of those scaling units are in this term.  If
you had the tree depth be 2 instead of 1, the variance inferred
would then be half as great, because it would be saying
how many multiples of 2 rather than how many multiples of 1.

2. I have not had time to read through all of Edwin's code to
see exactly what the model is.  However, note that there is a
distinction between individual effects that are on a whole
species, and individual effects that are on a single sampled
individual.  The latter are taking into account that the mean
phenotype of each species is only known from a finite sample
of individuals.  So it is taking into account within-species
phhenotypic variation and finiteness of sample sizes.  The
relevant methods there are by Ives, Midford and Garland
(2007) and by me (2008).  Ives et al. assume within-species
phenotypic variances and covariances are known, I give
methods for inferring them.

The methods of Lynch and Housworth are in effect
either assuming a sample size of 1 for each species,
or are considering their individual effect to be on the
whole species.

Within-species phenotypic variation can be a substantial
problem, as Ricklefs and Starck (1996) noted.  In their
example, the largest contrasts were between closely-related
species.  They suspected that this was due to within-species
phenotypic variation (which shows up as variance due to
sampling of the individual specimens measured).  It
causes trouble because the small branch lengths between
closely-related species are an inadequate predictor of
how different the species means will be.  In effect, the
model is wrong so some of the changes attributed to
between-species evolution are actually within-species
sampling variation (phenotypic variance).

Joe

Joe Felsenstein j...@gs.washington.edu
 Department of Genome Sciences and Department of Biology,
 University of Washington, Box 355065, Seattle, WA 98195-5065 USA

[[alternative HTML version deleted]]

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/


Re: [R-sig-phylo] Midpoint rooting routine?

2014-01-30 Thread Joe Felsenstein
Liam said:

 Joe's correct that this is not yet in our project to create an R interface
 for PHYLIP (Rphylip, https://github.com/liamrevell/Rphylip). As soon as I
 get a chance to work on this project today, I will add it.

That may be hard to do, as Retree is interactive and
has two levels of menus.

Joe

Joe Felsenstein j...@gs.washington.edu
 Department of Genome Sciences and Department of Biology,
 University of Washington, Box 355065, Seattle, WA 98195-5065 USA

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/


Re: [R-sig-phylo] Sampling from all bifurcating and multifurcating trees

2013-12-24 Thread Joe Felsenstein
Eduardo Ascarrunz ear...@gmail.com wrote:

 Also, I figured out I could work with unrooted trees. I suspect the procedure 
 would be similar to your rooted method, wouldn't it?

There is a 1-1 correspondence between n-species rooted tree topologies
and (n+1)-species unrooted tree topologies (this is discussed in
Chapter 3 of my book).  So you can simply generate a bifurcating
rooted tree, or a multifurcating rooted tree, of  n-1  tips, and then
unroot it, making the previous root now be species  n.   Then you get
a randomly sampled unrooted tree.

This works for tree topologies but not, I think, for labeled histories.

Joe

Joe Felsenstein j...@gs.washington.edu
 Department of Genome Sciences and Department of Biology,
 University of Washington, Box 355065, Seattle, WA 98195-5065 USA

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/


Re: [R-sig-phylo] R help generating a heat map

2013-12-23 Thread Joe Felsenstein
One can sample from all possible rooted labeled histories by randomly
bifurcating lineages, and at the end assigning the names randomly to the
tips.  Note that a labeled history is not the same as a tree topology.

One can sample randomly from all topologies of rooted bifurcating trees
with labeled tips by adding the species one by one to a one-species tree,
each time choosing to split them off from one of the branches.  The
enumeration of all such trees is based on counting all possible ways of
doing this, so choosing one such sequence should choose one of them at
random. Jim Rohlf has a paper with an algorithm equivalent to this in (I
think) Systematic Zoology but I cannot locate it at the moment.

For rooted multifurcating trees a similar method could be used.  In my 1978
paper in Systematic Zoology I showed that each such tree corresponded to
way adding the species 1 to n in order, where at each step we choose
equiprobably from among all branches and all interior nodes (so that if
there are 13 branches and 6 interior nodes we split off one of these 19 at
random). The enumeration algorithm for which this is based is also
discussed in detail in Chapter 3 of my book.

Joe


On Mon, Dec 23, 2013 at 8:57 PM, Eduardo Ascarrunz ear...@gmail.com wrote:

 Hi Liam,

 Thank you! That looks clever. How does this method bias the sampling? I
 think it could be useful for test running my code anyway.

 Looking forward your findings, and all the best,

 E.
 On 24 Dec 2013 04:49, Liam J. Revell liam.rev...@umb.edu wrote:

  Hi Eduardo.
 
  You could try something like this:
 
  randomFurcTrees-function(n,N=1){
foo-function(n){
  t-di2multi(rtree(n,br=sample(c(0,1),
size=2*(n-2),replace=TRUE),rooted=FALSE))
  t$edge.length-NULL
  t
}
trees-if(N1) replicate(N,foo(12),simplify=FALSE) else foo(n)
if(N1) class(trees)-multiPhylo
return(trees)
  }
 
  What this does is it generates random bifurcating topologies (using rtree
  in ape) with branch lengths that can be 0 or 1; then it collapses all
 zero
  length branches to polytomies. This is (demonstrably) *not* the same as
  picking trees at random from the set of all bi- and multifurcating
  topologies. It's not immediately obvious how you could do that, but I'll
  think about it.
 
  All the best, Liam
 
  Liam J. Revell, Assistant Professor of Biology
  University of Massachusetts Boston
  web: http://faculty.umb.edu/liam.revell/
  email: liam.rev...@umb.edu
  blog: http://blog.phytools.org
 
  On 12/23/2013 10:15 PM, Eduardo Ascarrunz wrote:
 
  Hello everyone,
 
  Newbie here. I'm looking for a way to generate random trees of N tips,
  allowing multifurcations. My N would be 12, so it wouldn't be practical
  (nor possible) to work with all the possible trees (cf. allFurcTrees).
 I'd
  be happy with a set of 1000 trees, sampled equiprobably. I'd much
  appreciate any help.
 
  Best to all,
 
  E.
 
  [[alternative HTML version deleted]]
 
  ___
  R-sig-phylo mailing list - R-sig-phylo@r-project.org
  https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
  Searchable archive at http://www.mail-archive.com/r-
  sig-ph...@r-project.org/
 
 

 [[alternative HTML version deleted]]

 ___
 R-sig-phylo mailing list - R-sig-phylo@r-project.org
 https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
 Searchable archive at
 http://www.mail-archive.com/r-sig-phylo@r-project.org/




-- 

Joe Felsenstein j...@gs.washington.edu
 Department of Genome Sciences and Department of Biology,
 University of Washington, Box 355065, Seattle, WA 98195-5065 USA

[[alternative HTML version deleted]]

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/


Re: [R-sig-phylo] Transforming data for OU model

2013-11-28 Thread Joe Felsenstein


Ted Garland wrote:

Our programs are in DOS (What's DOS?  The last operating system  
that worked ...).
However, I think you can also do this in R somewhere, maybe  
phytools?  Liam Revell, want to jump in here?


Just to remind people: DOS (also called MSDOS) executables programs  
can be run in Windows using the Command Prompt tool, which you will  
find in the Accessories menu that is found in the menu

opened by the All Programs tab in the Start Menu.

Joe

Joe Felsenstein, j...@gs.washington.edu
 Dept. of Genome Sciences, Univ. of Washington
 Box 355065, Seattle, WA 98195-5065 USA

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/


Re: [R-sig-phylo] PGLS vs lm

2013-07-12 Thread Joe Felsenstein

Tom Schoenemann asked me:

 With respect to your crankiness, is this the paper by Hansen that you are 
 referring to?:
 
 Bartoszek, K., Pienaar, J., Mostad, P., Andersson, S.,  Hansen, T. F. 
 (2012). A phylogenetic comparative method for studying multivariate 
 adaptation. Journal of Theoretical Biology, 314(0), 204-215.
 
 I wrote Bartoszek to see if I could get his R code to try the method 
 mentioned in there. If I can figure out how to apply it to my data, that will 
 be great. I agree that it is clearly a mistake to assume one variable is 
 responding evolutionarily only to the current value of the other (predictor 
 variables). 

I'm glad to hear that *somebody* here thinks it is a mistake (because it really 
is).  I keep mentioning it here, and Hansen has published extensively on it, 
but everyone keeps saying Well, my friend used it, and he got tenure, so it 
must be OK. 

The paper I saw was this one:

Hansen, Thomas F  Bartoszek, Krzysztof (2012). Interpreting the evolutionary 
regression: The interplay between observational and biological errors in 
phylogenetic comparative studies. Systematic Biology  61 (3): 413-425.  ISSN 
1063-5157.

J.F.

Joe Felsenstein j...@gs.washington.edu
 Department of Genome Sciences and Department of Biology,
 University of Washington, Box 355065, Seattle, WA 98195-5065 USA

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/


Re: [R-sig-phylo] question about measurement error in phylogenetic signal (Krzysztof Bartoszek)

2013-07-08 Thread Joe Felsenstein

In addition to the references to papers by Hansen and Bartoszek, and by Ives, 
Midford and Garland, I would biasedly suggest this paper:

Felsenstein, J. 2008. Comparative methods with sampling error and 
within-species variation: contrasts revisited and revised. American Naturalist 
171: 713-725.

The method estimates the within-species phenotypic variation (which, when you 
are analysing species means is the relevamt measurement error and also 
includes actual measurement error) and corrects for it.

The software announced there is not in R, but I believe that Liam Revell's  
phytools  package can call our program.

Joe

Joe Felsenstein j...@gs.washington.edu
Department of Genome Sciences and Department of Biology,
University of Washington, Box 355065, Seattle, WA 98195-5065 USA

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/


Re: [R-sig-phylo] question about analysing trait differences

2013-04-22 Thread Joe Felsenstein


Brian O'Meara wrote:

The problem is reconstructing overlap down the tree (it's possible  
to do,
but whether it's possible to do well is another question). One  
thing you
could do to avoid this is to use the *other* method from  
Felsenstein's 1985
independent contrasts paper, taking pairs of species independent of  
other

pairs. A way to do this:

1) Get a clade of size two (aka a cherry sensu Semple and Steel).  
Every

tree has at least one.
2) Difference in overlap and diference in mass for the pair of taxa  
making

up this clade becomes one point (perhaps standardize for time, and
positivize it such that you subtract them in the order that leaves  
the mass

difference positive).
3) Prune these two taxa off the tree.
4) Go back to step 1. When you're done, assuming no polytomies,  
you'll have
zero or one species left. [it's possible to do with polytomies,  
too: if
assumed to be hard, then just do a pair of taxa from a terminal  
polytomy
and leave the rest for the next step. If assumed to be soft, then  
just take

two from a terminal polytomy and discard the remaining taxa]

Contrasts gives you Ntax - 1 contrasts. This gives you floor(Ntax/ 
2) but

avoids having to do reconstructions. I have code around to do this but
can't find it at the moment, but it's not hard to write -- post  
again if

you need help with it and no better ideas have been proposed.


Note that this method (which goes all the way back to Salisbury
in 1942 as mentioned in my 2004 book) requires that in
step 3 prune does not mean using the pruning algorithm
of likelihood evaluation.  It means removing these two tips
(and their shared most recent common ancestor), leaving
behind nothing. So not calculating and adding to the
tree any phenotypes for the most recent common ancestor.

Thus if we find (chimp, bonobo) to be the cherry we
remove them, leaving a tree such as

(macacque, (gibbon, (orang, gorilla)));

so now (orang, gorilla)  is a cherry.

Joe

Joe Felsenstein, j...@gs.washington.edu
 Dept. of Genome Sciences, Univ. of Washington
 Box 355065, Seattle, WA 98195-5065 USA

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/


Re: [R-sig-phylo] question about analysing trait differences

2013-04-22 Thread Joe Felsenstein


I wrote:


Thus if we find (chimp, bonobo) to be the cherry we
remove them, leaving a tree such as

(macacque, (gibbon, (orang, gorilla)));

so now (orang, gorilla)  is a cherry.


but should have written

...leaving a tree such as

((macacque, (gibbon, (orang, (gorilla, human;

so now (gorilla, human) is a cherry.

Joe

Joe Felsenstein, j...@gs.washington.edu
 Dept. of Genome Sciences, Univ. of Washington
 Box 355065, Seattle, WA 98195-5065 USA

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/


Re: [R-sig-phylo] statistically test whether two characters evolve dependently

2013-04-09 Thread Joe Felsenstein

New Bio wrote:

 Thanks, Liam. good to know. I think extending the tools to analyze traits
 of ordered/unordered multistates can be very useful. There are many
 interesting traits such as oxygent requirement (anaerobic, facultative,
 aerobic) of microbes, which is ordered multistates, and habitats (water,
 air, soil), which is unordered.


For approaches like the threshold model, it is not simple to see how to
handle multistate characters. Should we assume one scale? If so, how
far from the 0/1 threshold should we place the 1/2 threshold? That
becomes an additional parameter to be estimated.

Or should we have an additional axis, so that 0/1 is on the  x  axis,
while  [01]/2 is on another axis?  And then what do we do about
state 3 if it also exists? That way lies madness ... or perhaps a
good Ph.D. thesis topic.

(This is in some way related to Transformation Series Analysis,
which was an issue with parsimony methods. One imagined one's
states arranged in a character transformation series which could
even be a tree, the Character State Tree. Then one wanted to
infer the phylogeny and at the same time also the CST, using
parsimony as the criterion.  In a sense what I am raising is the
likelihood and modeling equivalent problem.)

Joe

Joe Felsenstein j...@gs.washington.edu
 Department of Genome Sciences and Department of Biology,
 University of Washington, Box 355065, Seattle, WA 98195-5065 USA




[[alternative HTML version deleted]]

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/


Re: [R-sig-phylo] Seeking list of nucleotide substitution models

2013-04-06 Thread Joe Felsenstein

Emmanuel Paradis answered Daniel Barker's inquiry:

 Is there a review or list of ~every specific nucleotide substitution model
 that has been proposed or used in the literature (with references)?

 Hi Daniel,
 
 Have you looked at the help page of dist.dna? The list of references there is 
 focused on calculating distances but some are general. There are a few more 
 references in my book too. I suggest you also look at Inferring Phylogenies.

where you will find, in the section on Other distances in Chapter 13  that 
Andrey Zharkikh had a very good paper in 1994 covering many of the distances 
that had been invented up to that date, and explaining relationships between 
them:

Zharkikh, A. 1994. Estimation of evolutionary
distances between nucleotide sequences.
Journal of Molecular Evolution 39: 315.329

Joe

Joe Felsenstein j...@gs.washington.edu
Department of Genome Sciences and Department of Biology,
University of Washington, Box 355065, Seattle, WA 98195-5065 USA

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/


Re: [R-sig-phylo] testing for correlates of rates of evolution

2013-03-15 Thread Joe Felsenstein

Rob Lanfear wrote:

 In particular, Liam's using squared contrasts in y, so that's 
 asking whether the absolute size of changes in y depends on  
 x at the ancestral node. I might have missed something here, 
 but that sounds very similar in principle to Freckleton's 
 test of whether the variance of trait differences is 
 unrelated to their absolute values [1], except that the 
 latter looks for correlations between the absolute value of 
 differences in x versus x at the ancestral node. It might be 
 useful to consult that paper [1] to get some more ideas for 
 how to interpret those kinds of results.

If there is proportionality between standard deviation of 
change and the value of the character, that is essentially 
saying that the log(x) changes at rate independent of its 
value.  Similarly, if the variance of change is proportional 
to the value of the character, that is essentially the same 
as saying that the square root of the character changes at a 
rate independent of its value.

See this Wikipedia page:

http://en.wikipedia.org/wiki/Variance-stabilizing_transformation

Perhaps the whole test can be done by considering different
transforms of the data (there are parameterized families of them)
and use the likelihood to test values of the transform parameters (with
appropriate correction of the likelihood by having a Jacobian term).

Joe

Joe Felsenstein j...@gs.washington.edu
Department of Genome Sciences and Department of Biology,
University of Washington, Box 355065, Seattle, WA 98195-5065 USA

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/


Re: [R-sig-phylo] From ClustalW2 Tree to Heat Map in R

2013-01-10 Thread Joe Felsenstein

Klaus Schliep wrote:

 There is quite some irony that in phylogenetic reconstruction often
 non-ultrametric methods are preferred, even though the time to the
 last common ancestor (LCA) should be for each extend species the same.
 However other fields use heavily ultrametric methods (hclust) even
 there exist no evident equivalent property like the LCA.

The irony, such as it is, is due to the fact that we can only see amounts of 
difference, not amounts of time, and, darn it, different organisms have 
different biologies so they change at different rates.  Which makes biology 
more interesting but makes molecular change less clocklike.

Joe

Joe Felsenstein j...@gs.washington.edu
Department of Genome Sciences and Department of Biology,
University of Washington, Box 355065, Seattle, WA 98195-5065 USA




[[alternative HTML version deleted]]

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/


Re: [R-sig-phylo] Why no branch lengths on consensus trees?

2012-11-22 Thread Joe Felsenstein

Emmanuel Paradis wrote:

 If you are Bayesian, the trees sampled from an MCMC are here for estimation 
 including of the branch lengths, so you use them to compute some sort of 
 consensus topology as well as its branch lengths. So it makes sense that 
 MrBayes can do a consensus tree with branch lengths.


I endorse the rest of Emmanuel's advice but let me quibble with this one.  The 
posterior on trees may not consist mostly of trees varying around a single 
consensus.  If the posterior had, for example, two modes, each centered around 
a different tree, a single consensus tree might not be appropriate, and branch 
lengths computed by averaging lengths over the two modes might not be a good 
guide to what the trees in the posterior looked like.  I don't know enough 
about MrBayes features to know whether they have some way around this.

There is a similar issue with parsimony methods -- the set of most parsimonious 
trees may have a consensus, which may well not be a most parsimonious tree. 
People who see the consensus of most parsimonious trees may not realize that 
the particular tree they are looking at is not most parsimonious.

J.F.

Joe Felsenstein j...@gs.washington.edu
Department of Genome Sciences and Department of Biology,
University of Washington, Box 355065, Seattle, WA 98195-5065 USA

(from 1 October 2012 to 10 December 2012 on sabbatical  leave at)
Department of Statistics, University of California, Berkeley, 367 Evans Hall, 
Berkeley, CA  94710

___
R-sig-phylo mailing list
R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo


Re: [R-sig-phylo] Cluster tips into OTUs

2012-11-15 Thread Joe Felsenstein

Miguel Verdu --

 I want to perform a mothur-like OTU-based approach, but based on phylogenetic 
 instead of DNA sequence distances. Is there any way to cluster tips of a tree 
 into OTUs determined by a threshold of phylogenetic distances?. In other 
 words, I want to collapse all the tips connected with distances less than X 
 into the same OTUs.

I don't know which R program will do this (probably just a one-line if 
command) but the obvious method would be to take the distance matrix and reduce 
all the elements of it that are less than X to 0.000. Then run any distance 
method.

J.F.

Joe Felsenstein j...@gs.washington.edu
Department of Genome Sciences and Department of Biology,
University of Washington, Box 355065, Seattle, WA 98195-5065 USA

(from 1 October 2012 to 10 December 2012 on sabbatical  leave at)
Department of Statistics, University of California, Berkeley, 367 Evans Hall, 
Berkeley, CA  94710




[[alternative HTML version deleted]]

___
R-sig-phylo mailing list
R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo


Re: [R-sig-phylo] asymmetric transitions

2012-08-18 Thread Joe Felsenstein

Daniel Barker wrote:

 On ancestral state reconstructions. I've recently started using Ziheng
 Yang's terminology - of referring to reconstructions, derived from ML
 transition rates and equilibrium frequencies, as 'empirical Bayes'
 reconstructions. I believe this to be the most useful way to describe
 these methods.
 
 My question. Were does empirical Bayes stand, w.r.t. the Likelihood
 Principle?
 
 Empirical Bayes seems beyond the scope of the Likelihood Principle, or in
 violation of it. The biological hypotheses (here, of ancestral state) are
 not expressed as parameters of the model, so the relative support is not
 judged by a likelihood ratio. On the other hand, by only using the data at
 hand, empirical Bayes does comply with 'the irrelevance of outcomes not
 actually observed'.
 
 If anyone can provide or point to some thoughts on this, I'd be very
 grateful - for ancestral states or in general.

It has long been recognized -- since Anthony Edwards's 1970 paper and
Elizabeth Thompson's brilliant 1975 thesis and book --
that the interior node reconstructions are not, strictly speaking, MLEs.
They are maximum posterior probability estimates instead.  The root
ancestral states can be considered MLEs if that state is one of the
parameters of the model.  If instead (as often done for DNA and
protein data) the root ancestral state is considered to be drawn from
the equilibrium distribution of the stochastic process, then they too are
MPP estimates.

I would only call them empirical Bayes estimators if one made an ML
estimate of some stuff (the whole tree topology, which sounds like an
extreme case) and then, assuming the correctness of that estimate,
the ancestral states are inferred by being Bayesian.   In that case
probably the only thing that would be taken to have a prior on it
would be the ancestral state.  Then you could take the MPP as
the modal estimate from the posterior.

So, if I have understood his point correctly, Ziheng is formally correct, 
but it is a case where one is not being very Bayesian.  For my money,
you are a Bayesian not just if you use a prior, but if you are willing to
use a controversial prior.   And in this case the prior is pretty
uncontroversial.

Joe

Joe Felsenstein j...@gs.washington.edu
Department of Genome Sciences and Department of Biology,
University of Washington, Box 355065, Seattle, WA 98195-5065 USA




[[alternative HTML version deleted]]

___
R-sig-phylo mailing list
R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo


Re: [R-sig-phylo] asymmetric transitions

2012-08-16 Thread Joe Felsenstein


Mark Holder wrote:


With respect to Jarrod's simulations, I have a few thoughts:
	1. I don't understand the claim (in the original email) that its  
fairly straightforward to prove that asymmetric transition rates  
cannot be identified using data collected on the tips of a  
phylogeny It seems like this is something that is routinely done  
in phylogenetics, and proofs of identifiability of GTR exist  
(demonstrating that  this indeed feasible and not some  
computational artifact).


I agree.  This whole discussion may have started from a fact that  
is *not* fairly straightforward to prove.   In which case it is not  
necessary to bring speciation rates or priors into it.


A reversible two-state model should be able to have its parameters  
estimated on a given tree, clocklike or not.


Joe

Joe Felsenstein, j...@gs.washington.edu
 Dept. of Genome Sciences, Univ. of Washington
 Box 355065, Seattle, WA 98195-5065 USA

___
R-sig-phylo mailing list
R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo


Re: [R-sig-phylo] simulate traits evolution in correlated with body mass

2012-07-24 Thread Joe Felsenstein

Liam Revell wrote:

 library(phytools)
 y-r*x+sqrt(1-r^2)*fastBM(tree,sig2=mean(pic(x,tree)^2))

 This should give you the correlation r on average and the y|x should be 
 Brownian. (If x is Brownian, then both y  x  y|x will be too.)

If  y  is evolving in response to  x, and  x  is changing too, then wouldn't 
the expectation of  y  be a value that is predicted, not by the current  x,  
but to some extent also from past  x's?

One would want some much more specific model.

*** whining on ***
This is part of my stream of oft-repeated, widely-ignored complaints that one 
ought not regress present-day  y's  on present-day  x's.  Lots of people are 
doing it -- and they're all wrong.
*** whining off ***

Joe

Joe Felsenstein  j...@gs.washington.edu
 Dept of Genome Sciences and Dept of Biology, Univ. of Washington, Box 5065, 
Seattle Wa 98195-5065

___
R-sig-phylo mailing list
R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo


Re: [R-sig-phylo] LL ratio test

2012-06-17 Thread Joe Felsenstein

Ben Bolker --

 Now we can apply Fisher's  Likelihood Ratio Test of Fisher as well
 as the AIC.  The LRT tells us that the expectation 
 
  ... under the null hypothesis where M0 (the simpler model) is true?

Yup.  I'm considering that case to see how the AIC fits in with the LRT.
Of course the AIC is proposed for a much wider range of cases.

 of   
 2 log(L1) - 2 log(L0)   is   d1 - d0   (because it is distributed as a
 Chi-Square with that number of degrees of freedom).
 
 But the AIC tells us that the expectation is  2(d1 - d0).
 
  Maybe I'm missing something, but I don't see how the AIC tells us
 something about the expectation of 2 log(L1) - 2 log(L0) ?  It gives us
 the expectation of the Kullback-Leibler distance, which is something
 like sum(p(i) log(p(i)/q(i)) where p(i) is the true distribution of
 outcomes and q(i) is the predicted distribution of outcomes ... so it's
 something more like a marginal log-likelihood difference rather than a
 maximum log-likelihood difference ...

Well, the AIC ends up with comparing   -2 log(L) + 2d  for the two
hypotheses.   The difference of these for models  M1 and M0
is just (the negative of) 2 log(L1/L0) - 2(d1-d0).Or have I
missed something here?  So the expectation of the difference
is log likelihood  *is*  described by the AIC, right?   And isn't it
(in view of Fisher's distribution) wrong too?   That is what
disturbs me and makes me feel there is something I don't
understand about the AIC argument.

Joe

Joe Felsenstein j...@gs.washington.edu
Department of Genome Sciences and Department of Biology,
University of Washington, Box 355065, Seattle, WA 98195-5065 USA




[[alternative HTML version deleted]]

___
R-sig-phylo mailing list
R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo


Re: [R-sig-phylo] LL ratio test

2012-06-14 Thread Joe Felsenstein

Carl Boettiger wrote:

 Others on the list can weigh in with more authority, but perhaps this will
 get the discussion started.

Yes, it's important to know whether the parameters are nested, and
the issue of being at the end of a parameter range is serious.

 Recall that AIC values are a fequentist statistic: and they obey the very
 same same distribution as the likelihood ratio, (recall it is a difference
 log likelihoods, just shifted by the difference in the number of parameters
 (e.g. -2 [ log L1 -  log L0 - (k1 - k0)]).  Recall that the maximum
 likelihood estimate (MLE) is a biased estimate of the likelihood of your
 data and that AIC penalty is simply creating an asymptotically unbiased**
 estimator of the true model likelihood, which is a frequentist concept to
 begin with.  Why we report confidence intervals/p-values in the case of one
 of these statistics but not the other is not obvious to me either.

I will confess my relative ignorance of AIC issues (my phylogeny book
has a simple, elegant, and clear explanation -- which I wrote in a hurry 
while excited that I finally understood this, and which turns out to
make no sense whatsoever and should be firmly ignored by all).

But I do know this: If we have the likelihood ratio  R = L(p')/L(p)  where
p'  is the ML parameter values and  p  is the true parameter values,
and where  p  is in the interior of the set of possible parameters,
then RA Fisher showed about 1922 that asymptotically with large
amounts of data:

2 log(R)   is distributed as  chi-square with  D  degrees of freedom,
where  D  is the difference of the number of parameters being 
estimated in  p' and the number of parameters being estimated in
p.   Now we know that the expectation of that chi-squared variable
is  D.   So to correct the bias in  R   we should subtract  D.   That sounds
like what Carl is explaining too.

It sounds like a very simple and clear explanation of the AIC.  Unfortunately
that subtraction is *not* what AIC does.   It subtracts  2D.   The reason
it does so is unclear to me.  It involves some kind of prior on models,
I think.  As far as I am concerned it is like the peace of god, in that it
passeth human understanding.

Maybe the experts here can give me a simple explanation.  Otherwise
maybe we should honor Fisher (not me) and only subtract  D, and call the
result the FIC,  But that works only for nested hypotheses, and the main
point of the AIC is to deal with non-nested hypotheses.  To make matters
worse, in my field the AIC has the reputation of too easily favoring the
most complex hypothesis, so maybe we should be subtracting more than
2D, not less.

Clueless in Seattle.

Joe

Joe Felsenstein j...@gs.washington.edu
Department of Genome Sciences and Department of Biology,
University of Washington, Box 355065, Seattle, WA 98195-5065 USA




[[alternative HTML version deleted]]

___
R-sig-phylo mailing list
R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo


Re: [R-sig-phylo] Using AIC/AICc/BIC to select trait a model of trait evolution

2012-05-02 Thread Joe Felsenstein

Matt Pennell wrote:

 My question is: how many observations do we have when we compare trait
 evolutionary models? People tend to use the number of tips of taxa for
 which we have trait values. However, this may not be technically  
 accurate.
 First, of course, both the branch lengths and the tip values factor  
 into
 the likelihood equations so it seems sensible that these are both  
 somehow
 included as observations. Second, the trait values we observe are  
 of course
 not independent (that is the whole reason we are using a phylogeny  
 in the
 first place!!). It is unclear whether/how this fact should factor  
 into our
 calculation of the n. I know that it phylogenetics, when people do  
 model
 selection for the model of sequence evolution, they use the number  
 of sites
 in the alignment though i am not sure there is a clear  
 justification for
 this either.


On just one of these points:  number of sites in a molecular  
alignment is completely justified as the number to use in computing  
the number of data points, if the evolution at the sites is  
independent (i.e. independent, given the true phylogeny).   In that  
case each column of the data matrix is drawn i.i.d. from a distribution.

Of course, even correlations of rates of evolution among neighboring  
sites violates this.

Whether and how number of species comes in is trickier.

Joe

Joe Felsenstein  j...@gs.washington.edu
  Dept of Genome Sciences and Dept of Biology, Univ. of Washington,  
Box 5065, Seattle Wa 98195-5065


[[alternative HTML version deleted]]

___
R-sig-phylo mailing list
R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo


Re: [R-sig-phylo] Comparative Methods and Pseudo-Traits

2011-11-14 Thread Joe Felsenstein

Liam Revell wrote:

  Poaching Intensity = beta0 + beta1*Body Size + e

 I think it depends on how the residual error in the model is  
 distributed (esp. correlated) among species.  It seems possible to  
 invent hypothetical scenarios (as I did in my previous email) about  
 how the residual error in poaching intensity given body size could  
 be phylogenetically autocorrelated, but this is fundamentally an  
 empirical question.  If the residual error of poaching intensity  
 given body size is phylogenetically correlated and we ignore this  
 then we risk overestimating the predictive value of our model.

 In addition, the residual error is likely/guaranteed to be non- 
 Brownian if the response variable is binary (e.g., extant v.  
 extinct).  For these type of data the tree should not be ignored,  
 but simple GLS regression is probably not appropriate.  One option  
 might be the phylogenetic logistic regression of Ives  Garland  
 (2009), but I'm not too familiar with this method.

The real problem would come if the characters evolved to respond to  
the poaching intensity, and that evolution was inherited along the  
tree.  Then our uncertainty about the poaching intensity in the past  
would be a big problem.

But if poaching is only  a present-day phenomenon, it would  
(presumably) respond to only today's characters, and they would not  
yet have responded to it, so there would be no problem.

But I do think it is a Big Mistake (and apparently a frequent one),  
when people measure the residual of a character regressed on  
environmental variables,
then casually assume that it is undergoing Brownian Motion, when the  
environmental variables may have been different in the past.   That's  
probably not an issue here.

Joe

Joe Felsenstein  j...@gs.washington.edu
  Dept of Genome Sciences and Dept of Biology, Univ. of Washington,  
Box 5065, Seattle Wa 98195-5065


[[alternative HTML version deleted]]

___
R-sig-phylo mailing list
R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo


Re: [R-sig-phylo] Comparative Methods and Pseudo-Traits

2011-11-10 Thread Joe Felsenstein
Liam wrote:

 I'm not sure I entirely agree that we need to assume that the environmental 
 trait is evolving on the tree by Brownian motion.  I believe that so long as 
 Y|X (in David's example, growth rate given habitat degradation) is evolving 
 by Brownian motion, we should be OK to use PIC regression.

I worry.  Why are we to assume that current phenotypes are distributed around 
optima that are based on  *current* environmental variables?

Joe
---
Joe Felsenstein, j...@gs.washington.edu
Department of Genome Sciences and Department of Biology
University of Washington
Box 355065
Seattle WA 98195-5065

___
R-sig-phylo mailing list
R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo


Re: [R-sig-phylo] Is correlation of PICs, with tip data and each terminal node split into male and female, a valid method?

2011-11-05 Thread Joe Felsenstein

Anthony Ives wrote:

 Just a quick comment on your response to Dan.  You can use REML to compare 
 models, but only to compare the variance components and only if they have 
 the same mean components (e.g., contain the same predictor variables).  If 
 there are differences in the mean components, you need to use ML, because 
 (very loosely speaking) REML is based on estimating the variances conditional 
 on the mean.  I have come across examples where the results using ML and REML 
 (accidentally) are quite different, with ML being correct.

This is a good point.  But in many cases a REML analysis drops only the grand 
mean.  In effect it is ML, after passing the data through a filter which 
replaces it by the differences
of each point from the grand mean for that character.

If the grand mean is not of interest, REML should work as well as ML.  If we 
are trying to test some hypothesis about the value of the grand mean, of course 
REML is inappropriate.

An example would be simple one-way ANOVA which is actually a REML analysis, and 
it does test means of the groups.

Joe

Joe Felsenstein j...@gs.washington.edu
Department of Genome Sciences and Department of Biology,
University of Washington, Box 355065, Seattle, WA 98195-5065 USA




[[alternative HTML version deleted]]

___
R-sig-phylo mailing list
R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo


Re: [R-sig-phylo] Show Informative Sites?

2011-09-21 Thread Joe Felsenstein

Leandro Jones --

 Nick's rules won't allways work in a Parsimony context. For example, a
 position like this one:
 1 A
 2 A
 3 T
 4 C
 would be informative under rules (a) and (b), but it is in reality
 uninformative, as any of the possible trees have a length of 2. Thus,
 this character tells us nothing about _phylogeny_.

I disagree.  If the only way you can interpret anything is by parsimony,
sure.  But for statistical phylogenetics, it has information.  It works
against any phylogeny that has all its branch lengths short, for example.

Eliminating that character, not telling the statistical method you did
that, and then going ahead with the analysis is a Big Mistake.

J.F.

Joe Felsenstein j...@gs.washington.edu
 Department of Genome Sciences and Department of Biology,
 University of Washington, Box 355065, Seattle, WA 98195-5065 USA

___
R-sig-phylo mailing list
R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo


Re: [R-sig-phylo] Show Informative Sites?

2011-09-17 Thread Joe Felsenstein

Nick Matzke wrote:

 All sites are informative under likelihood,

... and under Bayesian methods, and under distance matrix methods too.  If you 
leave out uninformative sites without compensating for doing that by an 
explicit statistical correction, you will have some very unpleasant surprises.

J.F.

Joe Felsenstein j...@gs.washington.edu
Department of Genome Sciences and Department of Biology,
University of Washington, Box 355065, Seattle, WA 98195-5065 USA


[[alternative HTML version deleted]]

___
R-sig-phylo mailing list
R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo


Re: [R-sig-phylo] R: Re: R: ancestral state reconstruction for tips

2011-08-05 Thread Joe Felsenstein

Pasquale Raia said:

 Of course Ted is right, but my problem with this computation, or  
 with the
 simple exercise I was proposing is well another: as a  
 paleontologist I often
 come across pretty exceptional phenotypes (dwarf hippos and  
 elephants, huge
 flightless birds, to make a few examples). When you use methods  
 like this (I
 mean Garland and Ives') and compare the output with those  
 phenotypes, as I did,
 you immediately realize what the the bottom line is: no matter if  
 they are
 nodes or tips, by using the expected (under BM) covariance the  
 estimated
 phenotypes are dull, perfectly reasonable but very different from  
 anything
 exceptional you may find yourself to work with. This is why I feel  
 it is
 difficult to rely on those (unobserved) values to begin with.

I think that what is being said is that Brownian Motion is too sedate  
a process
and does not predict some of the large changes actually seen in the  
fossil
record.

That's a legitimate point but does put the onus on the maker of the  
point to
propose some other stochastic process that is tractable and has these  
large
changes (and that fits with known Mendelian and Darwinian mechanisms).
Just complaining that the Brownian stochastic process is no good is  
insufficient.

If we want to add the fossils to the calculation, then they will of  
course
pressure the Brownian Motion process to change more in their vicinity,
which may help some.

Joe

Joe Felsenstein  j...@gs.washington.edu
  Dept of Genome Sciences and Dept of Biology, Univ. of Washington,  
Box 5065, Seattle Wa 98195-5065


[[alternative HTML version deleted]]

___
R-sig-phylo mailing list
R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo


Re: [R-sig-phylo] R: Re: R: ancestral state reconstruction for tips

2011-08-05 Thread Joe Felsenstein

David Bapst wrote:

 I think Joe is right that realizing a model is an inaccurate or
 imprecise description of reality should impel us to develop better
 models of the world around us, because this partly how science moves
 forward. However, I don't think pointing out that a model is deficient
 requires that that person must themselves develop an alternative.
 After all, an alternative model that capture a more realistic level of
 complexity may not be possible in some situations (it is certainly
 possible in trait evolution models, however.) Requiring such a thing
 would put too much pressure on scientific whistle-blowers, who play a
 very important role in reminding the rest of us that the world is more
 than the models we use to understand it and make our predictions.

As a theoretician, I am perhaps oversensitive to the folks who, after
a lecture in which I present a simple model, triumphantly declare
but you didn't include predator satiation.  Then they walk away
looking very pleased with themselves.

There is a similar problem with the quibblers who inhabit grant
review panels, and are always asking me to do much more
complicated models that are impossibly hard (and they
are not aware how hard they are).

Just understand, when you raise legitimate concerns, that us
model-analyzers are also used to getting a lot of these unreasonable
demands too, and may be grumpy as a result.

Joe

Joe Felsenstein  j...@gs.washington.edu
  Dept of Genome Sciences and Dept of Biology, Univ. of Washington,  
Box 5065, Seattle Wa 98195-5065


[[alternative HTML version deleted]]

___
R-sig-phylo mailing list
R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo


Re: [R-sig-phylo] R: Re: R: Re: R: ancestral state reconstruction for tips

2011-08-05 Thread Joe Felsenstein

Folks --

I was intending my most recent message to be apologetic --
that I was perhaps overreactive.  Certainly Pas has not
raised unreasonable objections or been obstructive with
my grants! (Others have).

Let me raise an issue so I understand him more clearly:
Pas, are you saying that you see phenotypes in the fossils
that seem incompatible with the Brownian Motion assumption?

Joe

Joe Felsenstein  j...@gs.washington.edu
  Dept of Genome Sciences and Dept of Biology, Univ. of Washington,  
Box 5065, Seattle Wa 98195-5065


[[alternative HTML version deleted]]

___
R-sig-phylo mailing list
R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo


Re: [R-sig-phylo] R: Re: R: Re: R: Re: R: ancestral state reconstruction for tips

2011-08-05 Thread Joe Felsenstein

Pas said:

 What I', trying to do now is writing a R routine to back-calculate  
 the expected branch lengths for the unusual critters, given the  
 fitted ancestral values and tip values of the phenotypes, and  
 assuming BM, in order to compare the actual branch lengths to the  
 expected. The ratio of these lengths, if I'm not delusional and  
 definitely lucky, is a per-lineage rate of phenotypic evolution.  
 The bottom line is answering the question: how long should the  
 branch leading to that particular species be if it evolved at the  
 same rate of its sister species?

Good way to approach it.  If you can calculate the likelihood of  
trees, one way would be to not bother fitting any ancestral values:  
just try different lengths for the branch that connects the fossil to  
the tree, and see which one maximizes the likelihood.

Joe

Joe Felsenstein  j...@gs.washington.edu
  Dept of Genome Sciences and Dept of Biology, Univ. of Washington,  
Box 5065, Seattle Wa 98195-5065


[[alternative HTML version deleted]]

___
R-sig-phylo mailing list
R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo


Re: [R-sig-phylo] comparative methods at subspecies level

2011-07-24 Thread Joe Felsenstein

Paolo Piras wrote:

 I write you in order to ask a phylosophical opinion in
 applying phylogenetic comparative methods to
 subspecies belonging to the same species for which a
 phylogenetic/phylogeographic scenario is known on the
 basis of molecular studies. I have a number (10) of
 population samples belonging to two species at
 continental scale. Some OTUs are surely NOT panmittic
 with the others while for some others this is not
 sure.  Someone could argument against the use of
 comparative methods when a gene flow is still present
 among OTUs. However, I think that, if possible,  AT
 LEAST the covariance due to populations relationships
 should be removed. You dont think that it is always
 better to take in account all possible relatedness 
 betwen OTUs? In this paper
 
 D. C. BLACKBURN,G. J. MEASEY. 2009.Dispersal to or
 from an African biodiversity hotspot? Molecular
 Ecology; Volume 18, Issue 9, pages 1904–1915, May
 2009.
 
 the authors apply common comparative methods to a
 phylogeography of populations belonging to ONE
 species. I found their strategy appropriate. I wrote
 to ask your opinion about this issue.


Using standard tree-based methods to treat a single species which has 
populations that exchange migrants is inappropriate.  In a 2002 paper I 
discussed how to do this properly using contrasts which are derived from the 
eigenvectors and eigenvalues of (an estimate of) the migration matrix between 
populations.  Such migration matrices can now be inferred using programs LAMARC 
and Migrate.

The paper is here:
Felsenstein, J.  2002. Contrasts for a within-species comparative method.
  pp.  118-129 in Modern Developments in Theoretical Population Genetics,
  ed. M. Slatkin and M. Veuille.  Oxford University Press, Oxford.
Here
   http://evolution.gs.washington.edu/papers/spectrum/
is where you can get a PDF of a preprint version of it.

The method is univariate at the moment but a multivariate version is under 
development -- the univariate version can be used to see whether a character 
has significant covariation with an environmental measurement.   

The method is also described in a recent paper by Stone, Nee and me:

Stone, G. N., S. Nee, and J. Felsenstein. 2011. Controlling for 
non-independence in comparative analysis of patterns across populations within 
species. Philosophical Transactions of the Royal Society B 366 (1569): 
1410-1424.

which discusses the general issue.  The (other) authors preferred, for greater 
comprehensibility, to describe my method slightly incorrectly (I got outvoted).

Anyone considering this issue should consider these two papers carefully.   Of 
course mixing within- and between-species analyses is more difficult yet.   I 
hope to release an R package later this year to do the one-character analysis 
(it is not too hard to put one together yourself in the meantime).

Joe

Joe Felsenstein j...@gs.washington.edu
Department of Genome Sciences and Department of Biology,
University of Washington, Box 355065, Seattle, WA 98195-5065 USA




[[alternative HTML version deleted]]

___
R-sig-phylo mailing list
R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo


Re: [R-sig-phylo] pic() vs gls()

2011-07-14 Thread Joe Felsenstein


Paolo Piras wrote --


Citing
http://www.r-phylo.org/wiki/HowTo/Ancestral_State_Reconstruction:
Using Felsenstein's (1985) phylogenetic independent
contrasts (pic); This is also a Brownian-motion based
estimator, but it only takes descendants of each node
into account in reconstructing the state at that node.
More basal nodes are ignored.

I  THINK that, on the opposite, more basal nodes are
NOT ignored in gls and for this reason results can
differ slightly
I'm wrong?


The contrast algorithm if continued to the root,
makes the correct ancestral reconstruction for the root.
You are correct that values for higher nodes in the
tree are not the correct ML reconstruction for those
nodes.  If the tree is rerooted at any interior node
and the algorithm used for that, then that node's
reconstruction will be correct.  There are ways of
re-using information so that the total effort of doing this
for all interior nodes will be no worse than about twice
that of a single pass through the tree.

However people may prefer to use PGLS, which if
properly done should give the proper estimates for
all nodes.  There is some discussion of this in
Rohlf's 2001 paper in Evolution.

Joe

Joe Felsenstein, j...@gs.washington.edu
 Dept. of Genome Sciences, Univ. of Washington
 Box 355065, Seattle, WA 98195-5065 USA

___
R-sig-phylo mailing list
R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo


Re: [R-sig-phylo] Simulating BM data on phylogeny

2011-05-20 Thread Joe Felsenstein

Ted wrote --

 Based on zillions of BM simulations we have done with DOS PDSIMUL,  
 you should never see a shift in mean value (versus starting value  
 at root of tree), unless you are intentionally modeling a trend.   
 Something must be wrong.

in response to Dean Adams:

 For these simulations I generate a tree (in this
 case a
 perfectly-balanced tree) and simulate 100 data sets on the same
 phylogeny using a particular initial BM rate parameter (sigma).
...
 However, the most curious finding is that for all methods, as sigma
 increases, so too does the mean trait value across the tips (and
 the
 converse occurs as sigma decreases). This observation is curious to
 me,
 as one should not see a predictable shift in the mean under
 Brownian
 motion.


Is it possible that the simulation is doing Brownian Motion on some
other scale, such as a log scale?   If one does BM on the logs and then
looks at the original phenotype scale, you *would* expect that as the
variance among species increases, so does the mean of the species
means.

Joe

Joe Felsenstein  j...@gs.washington.edu
  Dept of Genome Sciences and Dept of Biology, Univ. of Washington,  
Box 5065, Seattle Wa 98195-5065


[[alternative HTML version deleted]]

___
R-sig-phylo mailing list
R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo


Re: [R-sig-phylo] Bootstrap values and NJ when there is no genetic distance between samples

2011-05-09 Thread Joe Felsenstein


Emmanuel wrote:


Is it a problem with ties or with identical sequences? I guess you can
solve the latter easily (eg, using the haplotype function in  
pegas), and

this will solve the vast majority of ties. Other cases of ties will
certainly not result in such high bootstrap values (that's my  
intuition).


My intuition disagrees with this.  If several sequences are nearly
identical, but each differs from their consensus at  K  sites,
them if the tree-making algorithm does not randomise
addition order of species (or otherwise somehow
randomize the resolution of ties), there are likely
to be artificially high bootstrap values even with
nonzero branch lenghs.

1. Dropping identical sequences is a good thing to do, especially  
with a distance-based method.


One issue is that after bootstrap sampling, some
sequences that were not identical may become
identical.  So the tree-making method ought to
be able to handle identical sequences.

2. A high bootstrap support (or another form of support) associated  
with a zero-length branch is an indication that something's wrong  
there.


Again, it can be a problem even when the branch lengths
are nonzero.

Joe

Joe Felsenstein, j...@gs.washington.edu
 Dept. of Genome Sciences, Univ. of Washington
 Box 355065, Seattle, WA 98195-5065 USA

___
R-sig-phylo mailing list
R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo


Re: [R-sig-phylo] Felsentsein's contrast LRT

2011-04-26 Thread Joe Felsenstein


Scott Chamberlain wrote:

In Felsenstein's 2008 AmNat paper he states Likelihood ratio rest  
(LRT) are available of the hypotheis that a set of q characters  
have on phylogenetic covariation with the remaining p - q  
characters. And his software contrast gives only one LRT result in  
the output even if you have say 3 traits. If you are interested in  
contrasts and a LRT for all pairwise relationships of the 3 traits,  
do you just run analyses 3 different times, each with the pair of  
traits you are interested in?


You can do all three tests, but they are not
independent tests.  In Contrast I allow the user to
specify two sets of characters and test whether
they have evolved independently.

If you want to do all three possible tests:
1 versus (2,3),  2 versus (1,3),  and 3 versus (1,2)
keep in mind that those tests are not
independent tests (for example, a strong
correlation between 2 and 3 could cause
two of the tests to show nonindependence.
You would need to run Contrast three times
to do those tests.

Perhaps what is needed is a test that
finds the two sets to divide the characters
into, sets that are as much independent as
possible.   I have no immediate ideas how.

J.F.

Joe Felsenstein, j...@gs.washington.edu
 Dept. of Genome Sciences, Univ. of Washington
 Box 355065, Seattle, WA 98195-5065 USA

___
R-sig-phylo mailing list
R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo


Re: [R-sig-phylo] phylogenetically-informed Reduced Major Axis regression in R?

2011-04-20 Thread Joe Felsenstein


Liam said:

Just calculating the slope is straightforward.  For tree and column  
vectors x  y (in order tree$tip.label):


The relevant point to keep in mind is that once you
have made maximum likelihood estimates of the means,
variances and covariances of the variables, the
Reduced Major Axis is simply a function of these,
and its ML estimate is that function of the ML estimates
of the covariances.  You don't need to do any
separate ML estimation for the RMA.

If you want to test hypotheses about the RMA,
if you can recast them as hypotheses about the
slopes and correlations (say that the slope is
zero) then the test can be done there, and no
separate test of the RMA is needed.

In the next release of my program Contrast in
PHYLIP, I will have an option to print out the
RMA and its other axes, which did not involve
anything more complicated than computing them
from the covariances that it was already estimating.

Joe

Joe Felsenstein, j...@gs.washington.edu
 Dept. of Genome Sciences, Univ. of Washington
 Box 355065, Seattle, WA 98195-5065 USA

___
R-sig-phylo mailing list
R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo


Re: [R-sig-phylo] Simulate node ages given known topology

2011-04-12 Thread Joe Felsenstein


Gene Hunt said:

Brian Sidlauskas did this in an Evolution paper in 2008 (based on  
Yang and Rannala 1997):


Sidlauskas, Brian L. 2008. Continuous and arrested morphological  
diversification in sister clades of characiform fishes: a  
phylomorphospace approach. Evolution 62(12): 3135-3156.


He had a morphological tree without fossil or molecular branch  
lengths.  The branch lengths were based on rates of speciation,  
extinction and sampling, and so it would be a different approach  
than the coalescent theory, I believe.   I think Brian has R code  
for it; you may want to contact him directly.


In my 2004 book, on pages  570-571 I note the use of a time  
transformation that makes simulation of birth-death processes easy.   
This is based on Bruce Rannala's 1997 work.


If you want to start out with the first fork in the tree at time T  
ago, you want to take your N tip species and choose uniformly from   
2, 3, ..., N-1 to choose the number of those tips that are descended  
from the left branch of the fork.  Say this is called K.   Then you  
need to simulate a birth-death process for K tips (the subtree  
descended from the left fork), and another for N-K tips for the right  
fork.   The result will be a fairly quick algorithm for simulating a  
BD process that has its oldest fork T units of time ago and leads to  
exactly N tips.


If instead you just want the process to start T units of time ago,  
and have the oldest fork occur some time after that (determined by  
the BD process), then that is even easier using the time- 
transformation trick.


I do not know the R packages well enough to know whether one of them  
implements this, but I am sure someone will comment on that.


Joe

Joe Felsenstein, j...@gs.washington.edu
 Dept. of Genome Sciences, Univ. of Washington
 Box 355065, Seattle, WA 98195-5065 USA

___
R-sig-phylo mailing list
R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo


[R-sig-phylo] Fwd: Simulate node ages given known topology

2011-04-12 Thread Joe Felsenstein


Andy Rominger wrote:


Thanks very much Tanja and Joe,

I think these suggestions for sorting will be very useful; we were  
trying to
do something similar (but very hacker-ish) and with this  
theoretical basis

I'm sure we'll pull it off.

And for BD, the time transformation should be perfect.


It is a very useful tool.   If people cite it, they should
check Bruce Rannala's paper to see if it is there too.
Some closely-related math certainly is.

One thing that computing the transformation for various
values of B and D (birth and death rates, called lambda
and mu in the derivation in my book) is that it gives you
a direct idea of how many groups born at each time
are missing from the tree owing to having gone extinct
before reaching the present day.

J.F.

Joe Felsenstein, j...@gs.washington.edu
 Dept. of Genome Sciences, Univ. of Washington
 Box 355065, Seattle, WA 98195-5065 USA

___
R-sig-phylo mailing list
R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo


Re: [R-sig-phylo] Multistate Trait Polymorphism

2011-04-07 Thread Joe Felsenstein

Luke Harmon said --

 Yes Emmanuel is correct, fitDiscrete does not deal with polymorphic data. I 
 have a fix that I made for a specific project that I'm sending to Charles, 
 if anyone else is interested email me off-list. It's very clunky.

I suspect this is not just a technical programming issue or a matter of 
standardizing formats of files, but depends on what you want to assume about 
the mode of evolution of a polymporphic character.  Not a trivial matter at 
all, and not one where you just want to accept any old arbitrary rule.

For example there is a very old (1967) parsimony method called polymorphism 
parsimony but it makes specific assumptions -- namely that polymorphism is 
hard to retain along a lineage, easy to lose but hard to regain.

So do you want assume that, or what?

Joe

Joe Felsenstein j...@gs.washington.edu
 Department of Genome Sciences and Department of Biology,
 University of Washington, Box 355065, Seattle, WA 98195-5065 USA

___
R-sig-phylo mailing list
R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo


Re: [R-sig-phylo] Branch and Bound Maximum Parsimony (Matthew Vavrek)

2011-03-28 Thread Joe Felsenstein

Regarding finding all most parsimonious trees by branch-and-bound:

Program Penny in my PHYLIP package could be called from within R,
driven by batch scripts.  You'd have to make them yourself, but it's
not hard.  However Penny can handle only 0/1 characters, and if there
is a multifurcation it will return all the binary trees compatible with
that.  It is also much slower than PAUP*.

It would get you a file of Newick trees.

Joe

Joe Felsenstein j...@gs.washington.edu
 Department of Genome Sciences and Department of Biology,
 University of Washington, Box 355065, Seattle, WA 98195-5065 USA

___
R-sig-phylo mailing list
R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo


Re: [R-sig-phylo] How to detect phylogenetic signal (lambda) in one unscaled trait?

2011-03-22 Thread Joe Felsenstein

Ted wrote:

 Following on that, various papers (I can't remember the references)
 have argued that imagining Brownian-like evolution of body size on a
 log scale seems reasonable.  That is, it should be equally easy for an
 elephant's body size to evolve 10% as for a mouse's body size to
 evolve 10%, and to analyze that you want everybody on a log scale. 
 Extending this, you would want to use log(Y/X) or log(Y/[X raised to
 some allometric slope]).

It's just easier to put all variables onto their log scales, so you
have log(X), log(Y), log(Z) and then the allometric stuff just
corresponds to linear combinations there, which you already have
machinery to do.

The recommendation to use log scales is a very old one:  I talk
about it in my Theoretical Evolutionary Genetics free e-text.
But is older than that.  Falconer has a whole chapter on Scale
in his 1960 Introduction of Quantitative Genetics.   Sewall
Wright has a discussion of it in Chapter 10 of his 1968 first
volume of Evolution and the Genetics of Populations (see pages
227ff.).  But it is older than those -- for Wright also says (p. 228):
Galton, as long ago as 1879, noted that the logarithms of measurements
of organisms may be more appropriate than the measurements
themselves on the hypothesis that growth factors tend to contribute
constant percentage increments rather than constant absolute ones.
The old biometrical types of the 1930s and 1940s knew all about
this (though taking logarithms was tedious).  It is only the more
recent researchers who don't know it.

Joe

Joe Felsenstein j...@gs.washington.edu
 Department of Genome Sciences and Department of Biology,
 University of Washington, Box 355065, Seattle, WA 98195-5065 USA

___
R-sig-phylo mailing list
R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo


Re: [R-sig-phylo] bootstrapping with boot.phylo()

2011-03-15 Thread Joe Felsenstein

Klaus Schliep wrote --

 I suspect that some elements in this distance matrix are close to 0.75
 than pairwise distances for the bootstrap samples are likely to be
 equal or greater than 0.75. Most models (K80, JC69 etc.) are not
 defined for distances =0.75 and will return Inf or NaN (the 0.75 can
 vary a bit, depending on the substitution model). bionj of course does
 not like building trees from infinite values as input. With short
 sequences the variances are of course larger and you are more likely
 to observe this, that's why your larger data set works fine.
 However in this cases NaN or Inf are the correct results!

I often have to deal with users of my PHYLIP package who
are upset at this happening with large distances when
the sequences are bootstrapped.  (In my package the distance is
set to -1.0 in that case, and it should not be used to make a tree).
NaN is not the correct value, but Inf is -- the correct distance
for (say) the Jukes-Cantor model or the Kimura 2-parameter model
when the sequences differ by more than 75% is (positive) infinity,
since these are inferred to be unrelated sequences.

 It would be nice to catch this error with a try and use only trees
 from finite distance matrices or set infinite values to a large value.
 But one should return a warning as these samples are likely to be
 biased.

Exactly.

J.F.

Joe Felsenstein j...@gs.washington.edu
 Department of Genome Sciences and Department of Biology,
 University of Washington, Box 355065, Seattle, WA 98195-5065 USA

___
R-sig-phylo mailing list
R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo


Re: [R-sig-phylo] Model-Selection vs. Finding Models that Fit Well

2011-01-31 Thread Joe Felsenstein

Ted said --

 One point for clarification and your further thoughts.
 The way parameterize the OU process in Lavin et al. (2008) it is a
 value of zero (not infinity) that gives a start phylogeny with
 contemporaneous tips.  Sometimes the ML estimate of d (what we call
 it) goes to zero, but more often it may go very small but not zero. 
 In either case, it seems to me you can do a LRT versus a star with one
 d.f.

Can you?  Is the value of zero contained within the range of possible
values, or at its extreme?  I think for LRT you need contained within.

Joe

Joe Felsenstein j...@gs.washington.edu
 Department of Genome Sciences and Department of Biology,
 University of Washington, Box 355065, Seattle, WA 98195-5065 USA

___
R-sig-phylo mailing list
R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo


Re: [R-sig-phylo] trees to matrix and gamma-statistic

2010-12-02 Thread Joe Felsenstein

Folks --

I believe that the distances predicted from the tree are called patristic 
distance.  They are a quite old concept.  I don't have the literature 
accessible right now but believe they were first described by JS Farris or by 
Jim Rohlf.  Searching on Google using that phrase yields more tools.

Cophenetic correlation is the correlation coefficient (the ordinary Pearson 
correlation) between the observed distances and the patristic distances.

Farris pointed out in the early 1970s that minimizing it chooses the same tree 
as minimizing the unweighted least squares fit of the distances.

Joe
---
Joe Felsenstein, j...@gs.washington.edu
Department of Genome Sciences and Department of Biology
University of Washington
Box 355065
Seattle WA 98195-5065
___
R-sig-phylo mailing list
R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo


Re: [R-sig-phylo] trees to matrix and gamma-statistic

2010-12-02 Thread Joe Felsenstein


Emmanuel wrote --

2. I understand that the gamma-statistic index of Pybus and Harvey  
applies only to ultrametric trees, there is a similar index for  
trees that are not ultrametric?


The 'ultrametricity' of the tree gives it a temporal feature (if  
done properly, of course), so that methods considering the tempo of  
evolution can be used to analyse such trees. If you cannot get  
ultrametric dated trees, see the package apTreeshape which has  
methods to analyse tree shapes, but the range of inferences will  
usually be less than with dated trees.


To rephrase what he said (but probably not change
the meaning), if the intent of the gamma statistic
is to measure the change through time of the speciation
rate (or the difference between the speciation and
extinction rates) then you need some way to
associate a time with each point in the tree.
Lacking an ultrametric tree, you don't have a
non-arbitrary way to do that.

Joe

Joe Felsenstein, j...@gs.washington.edu
 Dept. of Genome Sciences, Univ. of Washington
 Box 355065, Seattle, WA 98195-5065 USA

___
R-sig-phylo mailing list
R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo


Re: [R-sig-phylo] trees to matrix and gamma-statistic

2010-12-02 Thread Joe Felsenstein


Folks --

Now that I have had a chance to look up the history:

1. Sokal and Rohlf in 1962 (in Taxon) introduced the
cophenetic correlation.

2. They had a distance called the dendrogrammatic
distance which was (for the ultrametric trees they
considered), half of the patristic distance.

3. Wartren Wagner mentions patristic distance
in a 1968 review in Bioscience, but does not
claim it.

4. JS Farris's paper showing that the cophenetic
correlation is minimized when we achieve a least
squares fit of tree to distances is in 1969 in
Systematic Zoology.

Joe

Joe Felsenstein, j...@gs.washington.edu
 Dept. of Genome Sciences, Univ. of Washington
 Box 355065, Seattle, WA 98195-5065 USA

___
R-sig-phylo mailing list
R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo


Re: [R-sig-phylo] PIC vs. PGLS

2010-09-27 Thread Joe Felsenstein

 Simon Blomberg wrote on 27/09/2010 08:10:

(Ted Garland had written:)

 Another interesting technical point.  In general, PIC and PGLS are the 
 same, especially if you stick with a simple Brownian motion model of 
 character evolution.  However, their complete mathematical identity 
 has not, to my knowledge, been proven.

(Simon Blomberg:)
 I have a proof. I have a paper submitted to Sys. Biol. on the topic.
 
 You can find in my book the R code to get exactly the same coefficients 
 with PICs and PGLS. This works as long as the tree is ultrametric (for 
 equal variance). It's not a formal proof, of course, but a strong suspicion.

I can't speak on behalf of PGLS, but if one uses contrasts,
the ML analysis is of course using an i.i.d. model with zero expectations
of the contrasts.   It is provable that this will be identical (getting
the same P values and estimates) to a REML (reduced or restricted maximum
likelihood) on the full covarying Brownian motion model.   If PGLS gets
identical results to that, then that proves the identity to contrasts analysis.

Joe

Joe Felsenstein j...@gs.washington.edu
 Department of Genome Sciences and Department of Biology,
 University of Washington, Box 355065, Seattle, WA 98195-5065 USA

___
R-sig-phylo mailing list
R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo


Re: [R-sig-phylo] retrieving simulated ancestral states from sim.char{geiger}

2010-08-04 Thread Joe Felsenstein


Luke wrote:

 The simulations in Geiger do draw from a mvn distribution but - in an 
 unnecessary step - draw random numbers for every branch and use matrix 
 multiplication to get the tip values. This is kind of dumb and needlessly 
 slow but does give you the right answer. Carl is correct that simply drawing 
 from a mvn distribution is faster and equivalent. If I had time I'd rewrite 
 the function - maybe I will!

Without knowing exactly what your R code is doing, I just want to add to
these (correct) ideas that there is one simple way of getting tip and
interior node values without much more effort.   Work up the tree,
drawing a set of normals (one for each character) to get the net changes in
that branch.   Keep going until you reach the tips.   (If the characters
are not independent, you also need to have on hand a matrix which
contains the square root of the covariances among characters, and multiply
the values at each node by that.)

This way you don't need a nodes x nodes covariance matrix.

Joe

Joe Felsenstein j...@gs.washington.edu
 Department of Genome Sciences and Department of Biology,
 University of Washington, Box 355065, Seattle, WA 98195-5065 USA

___
R-sig-phylo mailing list
R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo


Re: [R-sig-phylo] Simulate continuous character evolution on phylogenetic trees using different models of evolution

2009-11-05 Thread Joe Felsenstein


Emmanuel said --

We discussed on the list a few months ago about using any  
correlation matrix to simulate characters. At the moment, it's  
tedious to get the correlation matrix from a corStruct object in  
ape, but I'll improve that soon (following several requests) so  
that it'll be straightforward to simulate (continous) traits under  
the available models of evolution.


Keep in mind that one can't always assume that the variances (say
of Brownian motion) are equal in all characters, so the correlations
aren't quite enough.

Joe

Joe Felsenstein, j...@gs.washington.edu
 Dept. of Genome Sciences, Univ. of Washington
 Box 355065, Seattle, WA 98195-5065 USA

___
R-sig-phylo mailing list
R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo


Re: [R-sig-phylo] is maximum likeyhood a phylogenetic approach?

2009-09-29 Thread Joe Felsenstein

Leandro Jones replied to Stephane Bouee:

 One could argue that your characters are not defined using a
 phylogenetic criterion; that is, when performing a phylogenetic
 analysis (either by ML or Parsimony), you have to assume that your
 dataset is made of homology hypotheses. That is, one could argue that
 your data are based on similarity relationships rather than on
 hypotheses of historical transformations.
 Having this in mind, it is questionable whether, using these data, any
 method could produce an acceptable estimate of the phylogeny of your
 group of interest.

If one simply measures the same character in different species,
is that not good enough?  My curiousity is because I suspect that
if Stephane had coded his characters 0/1 before trying to analyze
them, they would then be accepted as made of homology hypotheses.

J.F.

Joe Felsenstein j...@gs.washington.edu
 Department of Genome Sciences and Department of Biology,
 University of Washington, Box 355065, Seattle, WA 98195-5065 USA

___
R-sig-phylo mailing list
R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo


Re: [R-sig-phylo] is maximum likeyhood a phylogenetic approach?

2009-09-29 Thread Joe Felsenstein

When I wrote:

 As what classifications should be, or whether methods should be
 considered as making phylogenetic or phenetic classifications, I have my
 own position, that no one else seems to back (in public, anyway).  I 
 think that we should not think of these trees as classifications, and not 
 call them phylogenetic classifications or phenetic classifications, but 
 consider them as estimates of the phylogeny.  The issue of how to classify 
 is less important anyway.

Emmanuel Paradis responded -

 I have the strong feeling that most users of R and its [phylo]genetics
 packages are interested in the study of evolutionary processes, not in
 classification (I rarely see questions about classification or
 systematics here). So maybe most of us silently back Joe's position.
 
 About the issue of how to classify, I think it is very important. The
 point here is, in my view, that the confusion between classification and 
 evolution greatly hampered the progress of evolutionary biology, but the 
 situation has improved in recent years.

I can't speak for most users of R, but I do suspect that Emmanuel is
right in that there is agreement with this position among many younger
evolutionary biologists.  But it is a sufficiently intimidating atmosphere
for them that they do not usually say that out loud.  I have stuck my neck
out, mostly for the fun of it.  The reactions among many systematists have
been strong -- they are really outraged, and figure that this is just
some arbitrary opinion of mine, which they are (barely) willing to tolerate.
I suppose the matter will become one of open discussion some day.

Anyway, back to R.

J.F.

Joe Felsenstein j...@gs.washington.edu
 Department of Genome Sciences and Department of Biology,
 University of Washington, Box 355065, Seattle, WA 98195-5065 USA

___
R-sig-phylo mailing list
R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo


Re: [R-sig-phylo] is maximum likeyhood a phylogenetic approach?

2009-09-26 Thread Joe Felsenstein



Elsa et Stéphane Bouee wrote:

I am currently doing a classification analysis on morphologic data  
using

modern methods of morphometry (procruste analysis).

The variables obtained are quantitative and I would like to use  
them in a

phylogenetic approach.

For this purpose I used 2 methods:

1)   maximum likelihood with the software phylips and its  
contml add on;


As the author of Contml, which is a program, not an add- 
on (whatever that

means) I can perhaps comment.


2)   cladistic analysis on quantitative variables, with the  
software TNT

developed by Goloboff, Farris and Nixon.

I am wondering if the max likelihood is really performing a  
phylogenetic

classification or if it is rather a kind of phenetic method. Maybe the
answer may vary according to whom I ask the question …?

I have submitted an article presenting the max likelihood as a  
phylogenetic

classification but the reviewers are challenging this assertion.



Good luck with the reviewers -- I have never been able to influence
people who had strong views about parsimony being better than everything
else.

There's nothing wrong with the Procrustes superposition.  But the  
assumptions
of Contml are that each coordinate is independently changing  
according to

a Brownian motion process, and all at the same rate.

This is a dubious assumption at best.  I have written about this and  
warned

about this for years (Amer J Human Genetics 1973; Evolution 1981; Annual
Review of Ecology and Systematics 1988; article in volume on  
Morphology, Shape,
and Phylogenetics 2002).  I have an extensive discussion of the issue  
in my

2004 book, chapter 24.  In PHYLIP the documentation file  contchars.html
discusses it too.   Ideally it would be best to have some estimate of
the covariation between the characters, and use that to transform  
your coordinates

to independence.  I don't know an easy way to do that unless you have a
known phylogeny and can estimate the covariances along it using  
comparative

methods software.

So the referees are right to be skeptical, but they should be equally  
skeptical of

discrete characters parsimony approaches which make quite similar
assumptions.   The hardest-core cladistics people do not admit this.

As what classifications should be, or whether methods should be
considered as making phylogenetic or phenetic classifications, I have my
own position, that no one else seems to back (in public, anyway).  I  
think that we
should not think of these trees as classifications, and not call them  
phylogenetic
classifications or phenetic classifications, but consider them as  
estimates of the

phylogeny.  The issue of how to classify is less important anyway.

J.F.

Joe Felsenstein, j...@gs.washington.edu
 Dept. of Genome Sciences, Univ. of Washington
 Box 355065, Seattle, WA 98195-5065 USA
___
R-sig-phylo mailing list
R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo


Re: [R-sig-phylo] PICs: Why force linear regressions through the origin?

2009-05-08 Thread Joe Felsenstein

Gernot Huber asked --

 1. Why do you have to force a linear regression through the origin  
 when using phylogenetically independent contrasts (PICs)?

Because their expectations are zero.  A contrast could be either
X1 - X2 or X2 - X1, for one thing.  The expectations of X1 and X2
are identical so their difference (in either direction) has
expectation zero.

 2. If you suspect an inverse linear relationship, would you transform  
 one axis so you could still force the regression through the origin?  
 If so, what's the best transformation?

The machinery assumes a bivariate normal distribution.  So if, say,
Y = a(1/X) + b, wouldn't that mean you were assuming that Y and 1/X
were distributed as bivariate normal?  If so, the distribution of
X looks wierd when (1/X) gets near zero.  But at least in that case
If so, you could do that transformation (1/X) at the beginning.  Or are
you assuming that Y and X are bivariate normal and then  Y = a(1/X) + b?
That is dicier since X could in principle get down to zero and then
there is a loud explosion on the Y scale.

Assuming  (ln X, ln Y) is bivariate normal seems safer, especially
if the characters have natural limits at zero.  Then the regression
might just be a simple linear one.

Joe

Joe Felsenstein j...@gs.washington.edu
 Department of Genome Sciences and Department of Biology,
 University of Washington, Box 355065, Seattle, WA 98195-5065 USA

___
R-sig-phylo mailing list
R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo


Re: [R-sig-phylo] Parametric bootstraping

2009-03-16 Thread Joe Felsenstein

Fernando Fernandez wrote:

 first time I really have a question I can't solve
 Departing from the hypothesis that if anything is doable it can be
 done with R
 
 How can I do Parametric bootstrapping on a tree?
 
 That is I want to assume a phylogenetic tree as a good model,
 and then then shuffle leaves (with haplotypes of a different gene for example)
 to compare the randomised distribution to the a priori tree.
 
 (I cannot compare topographies of two trees because I have no good
 support for one of them,
 and I still presume this alternative is a good approach)

Some people answered questioning this as a good resampling
procedure.  Just to be the Voice Of Orthodoxy, let me add ...

Whatever its merits, it isn't the same as parametric
bootstrapping.  That involves simulating data along a tree,
not shuffling data among leaves.

J.F.

Joe Felsenstein j...@gs.washington.edu
 Department of Genome Sciences and Department of Biology,
 University of Washington, Box 355065, Seattle, WA 98195-5065 USA

___
R-sig-phylo mailing list
R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo