# Re: [R-sig-phylo] HKY GTR distances

```Emmanuel wrote:

>
> There is no distance formula for HKY or GTR model. For GTR, Rodrı́guez et
> al. developed a procedure to calculate a distance (also in Yang's 2006
> book). An example is given below with the woodmouse:
>
> matlog <- function(x) {
>     tmp <- eigen(X)
>     V <- tmp\$vectors
>     U <- diag(log(tmp\$values))
>     V %*% U %*% solve(V)
> }
>
> tr <- function(x) sum(diag(x))
>
> data(woodmouse)
>
> PI <- diag(base.freq(woodmouse[1:2, ]))
> Ft <- Ftab(woodmouse[1:2, ])
> F <- Ft/sum(Ft)
> X <- solve(PI) %*% F
> -tr(PI %*% matlog(X))
>
> You have to call this code for each pair of sequences (or wrap it in a
> function).
>
> For HKY, Yang mentioned a procedure Rzhetsky & Nei (1994, J Mol Evol).
>```
```
Strictly speaking there *is* a formula for GTR, which I think may be
equivalent to the above.  The formula is given in my book "Inferring
Phylogenies" on page 209 as equation 13.24.  Note that for both of
these, the equilibrium frequencies of the bases are estimated from the
empirical frequencies in the two sequences.  That means that for each
distance, the equilibrium frequencies are somewhat different, as
different sequences are being used to estimate the base frequencies.
We are not, in the use of these formulas, estimating one overall set
of equilibrium frequencies and then using that for all the the
distances in our data set.

The Rzhetsky-Nei 1994 distance formula is, I believe, an
approximation, though probably a very good one.  It is not quite the
same as the estimate you would get by maximizing the likelihood.

Distance formulas that involve empirically estimated base frequencies,
which all of these do, have in common the problem that one either
estimates those separately for each pair of sequences, or jointly
estimates them from the whole dataset, without using a tree in the
process.

J.F.
----
Joe Felsenstein         j...@gs.washington.edu
Department of Genome Sciences and Department of Biology,
University of Washington, Box 355065, Seattle, WA 98195-5065 USA

_______________________________________________
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/```