Re: [R-sig-phylo] Substitution model rate matrix - how to read it?

2021-11-27 Thread Emmanuel Paradis
Hi Martin,

Rates are generally relative quantities in phylogenetics. You cannot estimate 
the substitution rate(s) together with the branch lengths of the tree, but 
rates can be estimated relative to each others. That's why one rate is fixed to 
1 in the GTR model.

In the JC69 model the (single) rate cannot be estimated, unless you fix the 
branch lengths, which can be done in phangorn::optim.pml with the options 
optRate=TRUE, optEdge=FALSE (if you set both options to TRUE, you'll get a 
warning). If you do this, the output will include an element '...$rate' with 
the estimated substitution rate.

The rate matrix can have elements larger than 1, and it's its sums by row which 
are equal to 0. Generally in phylogenetics, there is no time information, so 
the branch lengths are interpreted in expected numbers of substitutions along 
each branch. You can still do the matrix exponential (e^(Q*t)) but the 
resulting probabilities cannot be interpreted easily.

Best,

Emmanuel

- Le 26 Nov 21, à 18:04, Martin Fikáček mfika...@gmail.com a écrit :

> Hi everybody,
> 
> I am now trying to explain the principles of phylogenetics to the students
> using R and went into a very simple problem that I cannot solve. Probably a
> very simple and basic thing, sorry for a stupid questions:
> 
> When checking the details of models selected for my data by modelTest() in
> phangorn, the rate matrix always includes number around 1 or even mich
> higher (for example this is the matrix for Laurasiatherian data with
> GTR+I+G model:
> 
> Rate matrix:
>  a  c  g t
> a  0.00  3.0009884 11.8735854  2.608831
> c  3.000988  0.000  0.5162325 21.771813
> g 11.873585  0.5162325  0.000  1.00
> t  2.608831 21.7718125  1.000  0.00
> 
> For some simple models it gives just 0 or 1 as for example this for JC:
> 
> Rate matrix:
>  a c g t
> a 0 1 1 1
> c 1 0 1 1
> g 1 1 0 1
> t 1 1 1 0
> 
> I would normally expect the rate matric to have values lower than 1, and to
> sum up to 0. Then it would make sense to use it also for calculating the
> probability matrix using e^(Q*t). I wanted to illustrate the meaning of the
> rate matrix estimated for real data to the students in this way, which is
> why I realized that the output by phangorn is different and I fail to find
> out why.
> 
> Thanks for any hint!
> 
> Martin
> 
> --
> *Martin Fikáček (費卡契) MSc. PhD.*
> *Department of Biological Sciences*
> *National Sun Yat-sen University*
> *No. 70, Lienhai Rd., Kaohsiung 80424, Taiwan*
> *E-mail: *mfika...@gmail.com, mfika...@mail.nsysu.edu.tw
> *Phone: *(+886) 75252000 # 3622
> *Website: *www.cercyon.eu
> 
>   [[alternative HTML version deleted]]
> 
> ___
> R-sig-phylo mailing list - R-sig-phylo@r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
> Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/


[R-sig-phylo] Substitution model rate matrix - how to read it?

2021-11-26 Thread Martin Fikáček
Hi everybody,

I am now trying to explain the principles of phylogenetics to the students
using R and went into a very simple problem that I cannot solve. Probably a
very simple and basic thing, sorry for a stupid questions:

When checking the details of models selected for my data by modelTest() in
phangorn, the rate matrix always includes number around 1 or even mich
higher (for example this is the matrix for Laurasiatherian data with
GTR+I+G model:

Rate matrix:
  a  c  g t
a  0.00  3.0009884 11.8735854  2.608831
c  3.000988  0.000  0.5162325 21.771813
g 11.873585  0.5162325  0.000  1.00
t  2.608831 21.7718125  1.000  0.00

For some simple models it gives just 0 or 1 as for example this for JC:

Rate matrix:
  a c g t
a 0 1 1 1
c 1 0 1 1
g 1 1 0 1
t 1 1 1 0

I would normally expect the rate matric to have values lower than 1, and to
sum up to 0. Then it would make sense to use it also for calculating the
probability matrix using e^(Q*t). I wanted to illustrate the meaning of the
rate matrix estimated for real data to the students in this way, which is
why I realized that the output by phangorn is different and I fail to find
out why.

Thanks for any hint!

Martin

-- 
*Martin Fikáček (費卡契) MSc. PhD.*
*Department of Biological Sciences*
*National Sun Yat-sen University*
*No. 70, Lienhai Rd., Kaohsiung 80424, Taiwan*
*E-mail: *mfika...@gmail.com, mfika...@mail.nsysu.edu.tw
*Phone: *(+886) 75252000 # 3622
*Website: *www.cercyon.eu

[[alternative HTML version deleted]]

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/