Re: [R-sig-phylo] HKY GTR distances

2017-02-03 Thread Joe Felsenstein
Emmanuel wrote:

>
> There is no distance formula for HKY or GTR model. For GTR, Rodrı́guez et
> al. developed a procedure to calculate a distance (also in Yang's 2006
> book). An example is given below with the woodmouse:
>
> matlog <- function(x) {
> tmp <- eigen(X)
> V <- tmp$vectors
> U <- diag(log(tmp$values))
> V %*% U %*% solve(V)
> }
>
> tr <- function(x) sum(diag(x))
>
> data(woodmouse)
>
> PI <- diag(base.freq(woodmouse[1:2, ]))
> Ft <- Ftab(woodmouse[1:2, ])
> F <- Ft/sum(Ft)
> X <- solve(PI) %*% F
> -tr(PI %*% matlog(X))
>
> You have to call this code for each pair of sequences (or wrap it in a
> function).
>
> For HKY, Yang mentioned a procedure Rzhetsky & Nei (1994, J Mol Evol).
>

Strictly speaking there *is* a formula for GTR, which I think may be
equivalent to the above.  The formula is given in my book "Inferring
Phylogenies" on page 209 as equation 13.24.  Note that for both of
these, the equilibrium frequencies of the bases are estimated from the
empirical frequencies in the two sequences.  That means that for each
distance, the equilibrium frequencies are somewhat different, as
different sequences are being used to estimate the base frequencies.
We are not, in the use of these formulas, estimating one overall set
of equilibrium frequencies and then using that for all the the
distances in our data set.

The Rzhetsky-Nei 1994 distance formula is, I believe, an
approximation, though probably a very good one.  It is not quite the
same as the estimate you would get by maximizing the likelihood.

Distance formulas that involve empirically estimated base frequencies,
which all of these do, have in common the problem that one either
estimates those separately for each pair of sequences, or jointly
estimates them from the whole dataset, without using a tree in the
process.

J.F.

Joe Felsenstein j...@gs.washington.edu
 Department of Genome Sciences and Department of Biology,
 University of Washington, Box 355065, Seattle, WA 98195-5065 USA

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/

Re: [R-sig-phylo] HKY GTR distances

2017-02-03 Thread Emmanuel Paradis

Dear Andreas,

There is no distance formula for HKY or GTR model. For GTR, Rodrı́guez 
et al. developed a procedure to calculate a distance (also in Yang's 
2006 book). An example is given below with the woodmouse:


matlog <- function(x) {
tmp <- eigen(X)
V <- tmp$vectors
U <- diag(log(tmp$values))
V %*% U %*% solve(V)
}

tr <- function(x) sum(diag(x))

data(woodmouse)

PI <- diag(base.freq(woodmouse[1:2, ]))
Ft <- Ftab(woodmouse[1:2, ])
F <- Ft/sum(Ft)
X <- solve(PI) %*% F
-tr(PI %*% matlog(X))

You have to call this code for each pair of sequences (or wrap it in a 
function).


For HKY, Yang mentioned a procedure Rzhetsky & Nei (1994, J Mol Evol).

Best,

Emmanuel

Rodrı́guez F., Oliver J. L., Marı́n A. & Medina J. R. 1990. The general 
stochastic model of nucleotide substitution. Journal of Theoretical

Biology 142: 485–501.


Le 01/02/2017 à 23:52, kolte...@rub.de a écrit :

Dear Phylothusiasts,
I need to compare multiple substitution models side-by-side (species
clustering stuff by distances only). Unfortunately, I am not aware of an
implementation of HKY and GTR distance computations using R. Maybe there
is some Github code or something else that I have been missing?
I do not want to build a phylogenetic tree. I can not use PAUP (>500 big
fasta files).
Any ideas are greatly appreciated.
Best wishes,
Andreas

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at
http://www.mail-archive.com/r-sig-phylo@r-project.org/







___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/

Re: [R-sig-phylo] HKY GTR distances

2017-02-02 Thread Jacob Berv
Can you do this in MEGA?
Jake

> On Feb 1, 2017, at 5:52 PM, kolte...@rub.de  
> wrote:
> 
> Dear Phylothusiasts,
> I need to compare multiple substitution models side-by-side (species 
> clustering stuff by distances only). Unfortunately, I am not aware of an 
> implementation of HKY and GTR distance computations using R. Maybe there is 
> some Github code or something else that I have been missing?
> I do not want to build a phylogenetic tree. I can not use PAUP (>500 big 
> fasta files).
> Any ideas are greatly appreciated.
> Best wishes,
> Andreas
> 
> ___
> R-sig-phylo mailing list - R-sig-phylo@r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
> Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/


[R-sig-phylo] HKY GTR distances

2017-02-02 Thread kolte...@rub.de

Dear Phylothusiasts,
I need to compare multiple substitution models side-by-side (species 
clustering stuff by distances only). Unfortunately, I am not aware of an 
implementation of HKY and GTR distance computations using R. Maybe there 
is some Github code or something else that I have been missing?
I do not want to build a phylogenetic tree. I can not use PAUP (>500 big 
fasta files).

Any ideas are greatly appreciated.
Best wishes,
Andreas

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/