Re: [R-sig-phylo] Link between Mahalanobis distance and PGLS

2019-01-04 Thread Joe Felsenstein
Guillaume Louvel asked:

>
> So my first question is: can we directly apply the Mahalanobis distance
> to measure a kind of "phylogeny-corrected" distance between 2 vectors of
> trait values for a list of species? Since we assume a brownian motion,
> we know these vectors should be drawn from a multivariate normal
> distribution with known covariance matrix. Therefore the Mahalanobis
> distance seems perfectly appropriate to me, is it the case?

It is appropriate.  In fact this is in effect what
regressing contrasts in trait Y on contrasts in
trait X is doing.  One can alternatively use a
multivariate regression appoach, which is what
Alan Grafen (1989) did, and the results are
the same either way (in the simplest cases).

Note that although the contrasts can be
treated as independent observations, that is
not true for the tip species values  -- the
Grafen "Phylogenetic Regression" does not
treat the tip values as independent, and for
the same reason pairwise distances between
tips are not independent.


>
> I don't want to do a statistical test per se, I am rather interested in
> ranking many traits according to their distance to a pattern of reference.

I am unclear about what that means.

>
> My second subsidiary question is: can I apply this Mahalanobis distance
> if my traits are binary (e.g. presence-absence of some sequence in the
> genomes). In that case I know that my trait is not multivariate normal,
> but considering that I have millions of traits, shouldn't I expect the
> whole set to have some normal characteristics?

Basically no.  Although people have approximated
binary traits by Gaussian variables (I think Paul
Harvey and Mark Pagel did in their 1991 book),
it is much more appropriate to use a threshold
model.  See my 2012 paper in American Naturalist
or the earlier 2005 sketch of the method in
Proc. Royal Society of London series B.

A  good paper agonizing about all this is:

Maddison WP & FitzJohn RG. 2015. The unsolved challenge to
phylogenetic correlation tests for categorical characters. Systematic
Biology 64: 127–136

though I'd say that the problem is not as "unsolved" as they think.



> Finally, if none of the approach above is justified, is there a
> multivariate phylogenetic method for discrete/binary traits? Some kind
> of adapted phylogenetic PCA ?

See above.  It does require MCMC, and cannot
simply be done with distances.

J.F.

Joe Felsenstein j...@gs.washington.edu
 Department of Genome Sciences and Department of Biology,
 University of Washington, Box 355065, Seattle, WA 98195-5065 USA

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/


[R-sig-phylo] Link between Mahalanobis distance and PGLS

2019-01-04 Thread Guillaume LOUVEL
Dear all,

I read somewhere (but can't find the source again) that finding the
least squares of a response variable Y against a linear combination of
explanatory variables X given a covariance matrix V (that is, doing a
Phylogenetic Generalized Least Squares) is equivalent to minimizing the
Mahalanobis distance of Y with the predicted values, which seems to make
sense to me.

So my first question is: can we directly apply the Mahalanobis distance
to measure a kind of "phylogeny-corrected" distance between 2 vectors of
trait values for a list of species? Since we assume a brownian motion,
we know these vectors should be drawn from a multivariate normal
distribution with known covariance matrix. Therefore the Mahalanobis
distance seems perfectly appropriate to me, is it the case?

I don't want to do a statistical test per se, I am rather interested in
ranking many traits according to their distance to a pattern of reference.

My second subsidiary question is: can I apply this Mahalanobis distance
if my traits are binary (e.g. presence-absence of some sequence in the
genomes). In that case I know that my trait is not multivariate normal,
but considering that I have millions of traits, shouldn't I expect the
whole set to have some normal characteristics?

I know that there is the Pagel's 1994 method for binary traits, however
it seemed to me that a distance-based method would be faster, and would
allow to order my candidates.

Finally, if none of the approach above is justified, is there a
multivariate phylogenetic method for discrete/binary traits? Some kind
of adapted phylogenetic PCA ?

Thanks a lot for your help,

Guillaume




signature.asc
Description: OpenPGP digital signature
___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/