Re: [R-sig-phylo] model averaging using brownie.lite

krzysztofbartoszek Thu, 05 Sep 2019 22:50:09 -0700

Dear Karla,  Concerning the n that should be passed to AICc I agree with what 
was previously written that this is a complex, case by case issue. One can have 
a number of ways to define what could be used in place of n.   However, at 
least in the one dimensional, BM and OU for me it seems that if your tree is 
not too small, then taking n or n-1, as written by Liam, should be OK for model 
selection purposes. At least I could not find issues with this:  
&#34;Phylogenetic effective sample size&#34; J. Theor. Biol. 407: 371-386, 
2016.   Best wishes  Krzysztof Bartoszek  Dnia 5 wrzelśnia 2019 18:41 
r-sig-phylo-request@r-project. napisał(a):  Send R-sig-phylo mailing list 
submissions to  r-sig-phylo@r-project.org   To subscribe or unsubscribe via the 
World Wide Web, visit  stat.ethz.ch stat.ethz.ch  or, via email, send a message 
with subject or body &#39;help&#39; to  r-sig-phylo-request@r-project.   You 
can reach the person managing the list at  r-sig-phylo-ow...@r-project.or   
When replying, please edit your Subject line so it is more specific  than 
&#34;Re: Contents of R-sig-phylo digest...&#34;    Today&#39;s Topics:    1. 
model averaging using brownie.lite (Karla Shikev)   2. Re: model averaging 
using brownie.lite (Liam Revell)   3. Re: model averaging using brownie.lite 
(Karla Shikev)   4. Re: model averaging using brownie.lite (Liam Revell)   5. 
Re: model averaging using brownie.lite (Brian O&#39;Meara)   6. Re: model 
averaging using brownie.lite (Cecile Ane)   7. Rate of trait evolution through 
time (Gregory Mutumi)   ------------------------------   Message: 1  Date: Thu, 
5 Sep 2019 10:13:34 -0300  From: Karla Shikev &lt;karlashi...@gmail.com&gt;  
To:   r-sig-phylo@r-project.org  Subject: [R-sig-phylo] model averaging using 
brownie.lite  Message-ID:  &lt;CAPU4_4pkQzH8wnzV7EgiWTawAzNP  Content-Type: 
text/plain; charset=&#34;utf-8&#34;   Dear all,   I&#39;ve been trying to use 
brownie.lite to implement the tutorial available  here ( treethinkers.org 
treethinkers.org  to  calculate model-averaged rates of evolution and for model 
selection (1  versus 2 rates). However, the current version of phytools 0.6-99 
won&#39;t  produce AICc estimates. Does anyone know a way around this? Any help 
would  be greatly appreciated.   thanks a bunch,   Karla   [[alternative HTML 
version deleted]]      ------------------------------   Message: 2  Date: Thu, 
5 Sep 2019 13:27:12 +0000  From: Liam Revell &lt;liam.rev...@umb.edu&gt;  To: 
Karla Shikev &lt;karlashi...@gmail.com&gt;, &#34;r-sig-phylo@r-project.org&#34; 
 &lt;r-sig-phylo@r-project.org&gt;  Subject: Re: [R-sig-phylo] model averaging 
using brownie.lite  Message-ID: &lt;a9dfbf84-f333-ca99-074c-604c2  
Content-Type: text/plain; charset=&#34;utf-8&#34;   Dear Karla.   You could try 
&amp; create your own logLik method for the object class  
&#34;brownie.lite&#34; as follows:   ## method  
logLik.brownie.lite&lt;-function(  lik&lt;-setNames(  
c(object$logL1,object$logL.mul  c(&#34;single-rate&#34;,&#34;multi-rate&#34;))  
attr(lik,&#34;df&#34;)&lt;-c(object$k1,ob  lik  }  ## fit model  
fit&lt;-brownie.lite(tree,x)  ## use it  logLik(fit)  AIC(fit)   All the best, 
Liam   Liam J. Revell  Associate Professor, University of Massachusetts Boston  
Profesor Asistente, Universidad Católica de la Ssma Concepción  web:  
faculty.umb.edu faculty.umb.edu   www.phytools.org www.phytools.org   Academic 
Director UMass Boston Chile Abroad (starting 2019):  www.umb.edu www.umb.edu   
On 9/5/2019 9:13 AM, Karla Shikev wrote:  [EXTERNAL SENDER]   Dear all,   
I&#39;ve been trying to use brownie.lite to implement the tutorial available  
here ( nam01.safelinks.protection.outlook.com 
nam01.safelinks.protection.outlook.com  to  calculate model-averaged rates of 
evolution and for model selection (1  versus 2 rates). However, the current 
version of phytools 0.6-99 won&#39;t  produce AICc estimates. Does anyone know 
a way around this? Any help would  be greatly appreciated.   thanks a bunch,   
Karla           [[alternative HTML version deleted]]   
______________________________  R-sig-phylo mailing list -   
R-sig-phylo@r-project.org  nam01.safelinks.protection.outlook.com 
nam01.safelinks.protection.outlook.com  Searchable archive at  
nam01.safelinks.protection.outlook.com nam01.safelinks.protection.outlook.com   
  ------------------------------   Message: 3  Date: Thu, 5 Sep 2019 10:49:24 
-0300  From: Karla Shikev &lt;karlashi...@gmail.com&gt;  Cc: 
&#34;r-sig-phylo@r-project.org&#34; &lt;r-sig-phylo@r-project.org&gt;  Subject: 
Re: [R-sig-phylo] model averaging using brownie.lite  Message-ID:  
&lt;CAPU4_4oJ6NOz3aRHre5T1PfJ0DOG  Content-Type: text/plain; 
charset=&#34;utf-8&#34;   Thanks so much, Liam! Just one quick follow-up 
question: what do you  suggest should be the sample size for transforming AIC 
into AICc? the  number of tips on the tree?   Karla   On Thu, Sep 5, 2019 at 
10:27 AM Liam Revell &lt;liam.rev...@umb.edu&gt; wrote:   Dear Karla.   You 
could try &amp; create your own logLik method for the object class  
&#34;brownie.lite&#34; as follows:   ## method  
logLik.brownie.lite&lt;-function(         lik&lt;-setNames(                     
                      attr(lik,&#34;df&#34;)&lt;-         lik  }  ## fit model  
fit&lt;-brownie.lite(tree,x)  ## use it  logLik(fit)  AIC(fit)   All the best, 
Liam   Liam J. Revell  Associate Professor, University of Massachusetts Boston  
Profesor Asistente, Universidad Católica de la Ssma Concepción  web:  
faculty.umb.edu faculty.umb.edu   www.phytools.org www.phytools.org   Academic 
Director UMass Boston Chile Abroad (starting 2019):  www.umb.edu www.umb.edu   
On 9/5/2019 9:13 AM, Karla Shikev wrote:  &gt; [EXTERNAL SENDER]  &gt;  &gt; 
Dear all,  &gt;  &gt; I&#39;ve been trying to use brownie.lite to implement the 
tutorial available  &gt; here (  nam01.safelinks.protection.outlook.com 
nam01.safelinks.protection.outlook.com  to  &gt; calculate model-averaged rates 
of evolution and for model selection (1  &gt; versus 2 rates). However, the 
current version of phytools 0.6-99 won&#39;t  &gt; produce AICc estimates. Does 
anyone know a way around this? Any help  would  &gt; be greatly appreciated.  
&gt;  &gt; thanks a bunch,  &gt;  &gt; Karla  &gt;  &gt;          [[alternativ 
HTML version deleted]]  &gt;  &gt; ______________________________  &gt; 
R-sig-phylo mailing list -   R-sig-phylo@r-project.org  &gt;  
nam01.safelinks.protection.outlook.com nam01.safelinks.protection.outlook.com  
&gt; Searchable archive at  nam01.safelinks.protection.outlook.com 
nam01.safelinks.protection.outlook.com  &gt;    [[alternative HTML version 
deleted]]      ------------------------------   Message: 4  Date: Thu, 5 Sep 
2019 13:55:53 +0000  From: Liam Revell &lt;liam.rev...@umb.edu&gt;  To: Karla 
Shikev &lt;karlashi...@gmail.com&gt;  Cc: &#34;r-sig-phylo@r-project.org&#34; 
&lt;r-sig-phylo@r-project.org&gt;  Subject: Re: [R-sig-phylo] model averaging 
using brownie.lite  Message-ID: &lt;d9de7e79-a712-7214-2af4-ed53c  
Content-Type: text/plain; charset=&#34;utf-8&#34;   Dear Karla.   In my 
opinion, it is probably correct to use the number of tips on the  tree as the 
sample size for AICc when estimating the Brownian rate: as  the number of 
independent pieces of information is n-1, just like with  an ordinary variance. 
For other parameters in phylogenetic comparative  analyses, the effective 
sample size may be different, however.   All the best, Liam   Liam J. Revell  
Associate Professor, University of Massachusetts Boston  Profesor Asistente, 
Universidad Católica de la Ssma Concepción  web:  faculty.umb.edu 
faculty.umb.edu   www.phytools.org www.phytools.org   Academic Director UMass 
Boston Chile Abroad (starting 2019):  www.umb.edu www.umb.edu   On 9/5/2019 
9:49 AM, Karla Shikev wrote:  [EXTERNAL SENDER]   Thanks so much, Liam! Just 
one quick follow-up question: what do you  suggest should be the sample size 
for transforming AIC into AICc? the  number of tips on the tree?   Karla   On 
Thu, Sep 5, 2019 at 10:27 AM Liam Revell &lt;liam.rev...@umb.edu&gt; wrote:   
Dear Karla.   You could try &amp; create your own logLik method for the object 
class  &#34;brownie.lite&#34; as follows:   ## method  
logLik.brownie.lite&lt;-function(          lik&lt;-setNames(                    
                        attr(lik,&#34;df&#34;)          lik  }  ## fit model  
fit&lt;-brownie.lite(tree,x)  ## use it  logLik(fit)  AIC(fit)   All the best, 
Liam   Liam J. Revell  Associate Professor, University of Massachusetts Boston  
Profesor Asistente, Universidad Católica de la Ssma Concepción  web:  
faculty.umb.edu faculty.umb.edu   nam01.safelinks.protection.outlook.com 
nam01.safelinks.protection.outlook.com   Academic Director UMass Boston Chile 
Abroad (starting 2019):  www.umb.edu www.umb.edu   On 9/5/2019 9:13 AM, Karla 
Shikev wrote:  [EXTERNAL SENDER]   Dear all,   I&#39;ve been trying to use 
brownie.lite to implement the tutorial available  here (  
nam01.safelinks.protection.outlook.com nam01.safelinks.protection.outlook.com  
to  calculate model-averaged rates of evolution and for model selection (1  
versus 2 rates). However, the current version of phytools 0.6-99 won&#39;t  
produce AICc estimates. Does anyone know a way around this? Any help  would  be 
greatly appreciated.   thanks a bunch,   Karla            [[alternativ HTML 
version deleted]]   ______________________________  R-sig-phylo mailing list -  
 R-sig-phylo@r-project.org   nam01.safelinks.protection.outlook.com 
nam01.safelinks.protection.outlook.com  Searchable archive at  
nam01.safelinks.protection.outlook.com nam01.safelinks.protection.outlook.com   
          [[alternative HTML version deleted]]   ______________________________ 
 R-sig-phylo mailing list -   R-sig-phylo@r-project.org  
nam01.safelinks.protection.outlook.com nam01.safelinks.protection.outlook.com  
Searchable archive at  nam01.safelinks.protection.outlook.com 
nam01.safelinks.protection.outlook.com     ------------------------------   
Message: 5  Date: Thu, 5 Sep 2019 10:55:11 -0400  From: &#34;Brian 
O&#39;Meara&#34; &lt;omeara.br...@gmail.com&gt;  To: Liam Revell 
&lt;liam.rev...@umb.edu&gt;  Cc: Karla Shikev &lt;karlashi...@gmail.com&gt;,  
&#34;r-sig-phylo@r-project.org&#34;  &lt;r-sig-phylo@r-project.org&gt;  
Subject: Re: [R-sig-phylo] model averaging using brownie.lite  Message-ID:  
&lt;CAKywhkq-CgX=khQnAbztr5s58dSb  Content-Type: text/plain; 
charset=&#34;utf-8&#34;   Sample size is a weird thing in this area for AICc. 
For comparing DNA  models in something like ModelTest, number of sites is used, 
but for OU/BM  models, we typically use number of taxa. It&#39;s not resolved 
what&#39;s best.   Posada and Buckley (2004,  doi.org doi.org  have a  
discussion on this:   Both in the AICc and the BIC descriptions above, the 
total number of  characters was used as an estimate of sample size. However, 
effective  sample sizes in phylogenetic studies are poorly understood, and 
depend on  the quantity of interest (Churchill et al., 1992; Goldman, 1998; 
Morozov et  al., 2000). Characters in an alignment will often not be 
independent, so  using the total number of characters as a surrogate for sample 
size (Minin  et al., 2003; Posada and Crandall, 2001b) could be an 
overestimate. Using  only the number of variable sites as an estimate of sample 
size is a more  conservative approach, but could be an underestimate (note that 
all sites  are used when estimating base frequencies or the proportion of 
invariable  sites). Indeed, sample size also depends on the number of taxa.  
Importantly, sample size can have an effect on the outcome of model  selection 
with the AICc. In our example above, if we were to use the number  of variable 
characters (301 sites) as the sample size, instead of the total  number of 
characters (1927 sites), the best AICc model would not change,  but the second 
and third AICc models would exchange their rankings.  Furthermore, because the 
LRT, the AIC, and the BIC strategies rely on large  sample asymptotics, it is 
also important to decide when a sample should be  considered small. Although 
the AICc was derived under Gaussian assumptions,  Burnham et al. (1994) found 
that this second order expression performed  well in product multinomial models 
for open population capture-recapture.  Burnham and Anderson (2003, p. 66) 
suggest using this correction when the  sample size is small compared to the 
number of adjustable parameters, n/K &lt;  40. Alternatively, and because AICc 
converges to the AIC with increasing  n/K ratios, one could always use the AICc 
(D. Anderson, personal  communications). Phylogenetic characters are mostly 
discrete, and the  unconstrained model in phylogenetics is multinomial 
(Goldman, 1993). One  may think of an alignment of nucleotide characters as a 
large and sparse  contingency table with 4^T bins, where T is the number of 
taxa. For large  sample asymptotics to hold in a contingency table every cell 
should  contain, in general, more than 5 observations (see Agresti, 1990, p. 
49,  244–250), which gives a rule of thumb of n/4^T &gt; 5. Clearly, more 
research  is needed on sample size in phylogenetics.   Beaulieu et al. (2018,  
doi.org doi.org  note my COI  as I&#39;m an author on this) did some 
simulations on a codon model testing  different ways of counting sample size 
(number of sites, number of taxa,  number of sites * number of taxa, etc.) and 
found that number of cells in  the matrix (number of sites * number of taxa) 
seemed to work best to  approximate Kullback-Liebler distance. For univariate 
models like that used  in brownie.lite, number of cells is equal to number of 
taxa (since there&#39;s  only one column):   We note our use of AICc, as 
calculated in Burnham and Anderson (2002, p.  66) and as opposed to the 
standard AIC, in the above model comparisons. At  the outset of our study it 
was unclear what the appropriate sample size n  is when comparing models of 
sequence evolution. Building upon the work of  Jhwueng et al. (2014), our 
simulations suggest that using the number of  taxa times the number of sites as 
the sample size correction performed best  as a small sample size correction 
for estimating Kullback–Liebler (KL)  distance in phylogenetic models 
(Supporting Materials). This also has an  intuitive appeal. In models that have 
at least some parameters shared  across sites and some parameters shared across 
taxa, increasing the number  of sites and/or taxa should be adding more samples 
for the parameters to  estimate. This is consistent considering how likelihood 
is calculated for  phylogenetic models: the likelihood for a given site is the 
sum of the  probabilities of each observed state at each tip, which is then 
multiplied  across sites. It is arguable that the conventional approach in 
comparative  methods is calculating AICc in the same way. That is, if only one 
column of  data (or “site”) is examined, as remains remarkably common in 
comparative  methods, when we refer to sample size, it is technically the 
number of taxa  multiplied by number of sites, even though it is referred to 
simply as the  number of taxa.   I suspect this is still not a great 
approximation. Compare a balanced tree  (every internal node having two 
descendants) with every internal branch  length the same versus a pectinate 
(caterpillar) tree where the two edges  connecting to the root node are very 
long and the other edges are all near  zero. For the same number of taxa and 
same number of sites, I bet the first  tree has more meaningful data: the 
pectinate tree with those branch lengths  will likely have all but one of the 
taxa having nearly identical states. So  I think tree shape and branch lengths 
should matter for this. I&#39;ve done  some preliminary analyses on this, 
building on Beaulieu et al. (2018) and  Jhwueng et al. (2014,   doi.org doi.org 
 also note  COI), but nothing definitive yet.   It&#39;s also worth looking at 
Ho and Ané (2014,  doi.org doi.org  who talk about AIC in the context  of OU 
shifts, but who get into sample size with shifts in a modified BIC  that uses 
taxa in different regimes as sample size (but again, univariate,  so maybe 
it&#39;s actually matrix size).   I also probably am missing important work by 
others -- my apologies if so.  If you know of any, please let me know (and 
probably Karla, too!).   So, in summary.... yeah, what Liam said: number of 
taxa, but it might be  more complex.   Best,  Brian   
______________________________  Brian O&#39;Meara,  brianomeara.info, 
brianomeara.info,  especially Calendar  &lt; brianomeara.info brianomeara.info  
CV  &lt; brianomeara.info brianomeara.info  and Feedback  &lt; brianomeara.info 
brianomeara.info   Professor, Dept. of Ecology &amp; Evolutionary Biology, UT 
Knoxville  Associate Head, Dept. of Ecology &amp; Evolutionary Biology, UT 
Knoxville  He/Him/His     On Thu, Sep 5, 2019 at 10:00 AM Liam Revell 
&lt;liam.rev...@umb.edu&gt; wrote:   Dear Karla.   In my opinion, it is 
probably correct to use the number of tips on the  tree as the sample size for 
AICc when estimating the Brownian rate: as  the number of independent pieces of 
information is n-1, just like with  an ordinary variance. For other parameters 
in phylogenetic comparative  analyses, the effective sample size may be 
different, however.   All the best, Liam   Liam J. Revell  Associate Professor, 
University of Massachusetts Boston  Profesor Asistente, Universidad Católica de 
la Ssma Concepción  web:  faculty.umb.edu faculty.umb.edu   www.phytools.org 
www.phytools.org   Academic Director UMass Boston Chile Abroad (starting 2019): 
 www.umb.edu www.umb.edu   On 9/5/2019 9:49 AM, Karla Shikev wrote:  &gt; 
[EXTERNAL SENDER]  &gt;  &gt; Thanks so much, Liam! Just one quick follow-up 
question: what do you  &gt; suggest should be the sample size for transforming 
AIC into AICc? the  &gt; number of tips on the tree?  &gt;  &gt; Karla  &gt;  
&gt; On Thu, Sep 5, 2019 at 10:27 AM Liam Revell &lt;liam.rev...@umb.edu&gt; 
wrote:  &gt;  &gt;&gt; Dear Karla.  &gt;&gt;  &gt;&gt; You could try &amp; 
create your own logLik method for the object class  &gt;&gt; 
&#34;brownie.lite&#34; as follows:  &gt;&gt;  &gt;&gt; ## method  &gt;&gt; 
logLik.brownie.lite&lt;-function(  &gt;&gt;          lik&lt;-setName  &gt;&gt;  
                &gt;&gt;                  &gt;&gt;          attr(lik,&#34;df  
&gt;&gt;          lik  &gt;&gt; }  &gt;&gt; ## fit model  &gt;&gt; 
fit&lt;-brownie.lite(tree,x)  &gt;&gt; ## use it  &gt;&gt; logLik(fit)  
&gt;&gt; AIC(fit)  &gt;&gt;  &gt;&gt; All the best, Liam  &gt;&gt;  &gt;&gt; 
Liam J. Revell  &gt;&gt; Associate Professor, University of Massachusetts 
Boston  &gt;&gt; Profesor Asistente, Universidad Católica de la Ssma Concepción 
 &gt;&gt; web:  faculty.umb.edu faculty.umb.edu  
nam01.safelinks.protection.outlook.com nam01.safelinks.protection.outlook.com  
&gt;&gt;  &gt;&gt; Academic Director UMass Boston Chile Abroad (starting 2019): 
 &gt;&gt;  www.umb.edu www.umb.edu  &gt;&gt;  &gt;&gt; On 9/5/2019 9:13 AM, 
Karla Shikev wrote:  &gt;&gt;&gt; [EXTERNAL SENDER]  &gt;&gt;&gt;  &gt;&gt;&gt; 
Dear all,  &gt;&gt;&gt;  &gt;&gt;&gt; I&#39;ve been trying to use brownie.lite 
to implement the tutorial  available  &gt;&gt;&gt; here (  &gt;&gt;  
nam01.safelinks.protection.outlook.com nam01.safelinks.protection.outlook.com  
)  &gt;&gt; to  &gt;&gt;&gt; calculate model-averaged rates of evolution and 
for model selection (1  &gt;&gt;&gt; versus 2 rates). However, the current 
version of phytools 0.6-99 won&#39;t  &gt;&gt;&gt; produce AICc estimates. Does 
anyone know a way around this? Any help  &gt;&gt; would  &gt;&gt;&gt; be 
greatly appreciated.  &gt;&gt;&gt;  &gt;&gt;&gt; thanks a bunch,  &gt;&gt;&gt;  
&gt;&gt;&gt; Karla  &gt;&gt;&gt;  &gt;&gt;&gt;           [[alternat HTML 
version deleted]]  &gt;&gt;&gt;  &gt;&gt;&gt; ______________________________  
&gt;&gt;&gt; R-sig-phylo mailing list -   R-sig-phylo@r-project.org  
&gt;&gt;&gt;  &gt;&gt;  nam01.safelinks.protection.outlook.com 
nam01.safelinks.protection.outlook.com  &gt;&gt;&gt; Searchable archive at  
&gt;&gt;  nam01.safelinks.protection.outlook.com 
nam01.safelinks.protection.outlook.com  &gt;&gt;&gt;  &gt;&gt;  &gt;  &gt;      
    [[alternativ HTML version deleted]]  &gt;  &gt; 
______________________________  &gt; R-sig-phylo mailing list -   
R-sig-phylo@r-project.org  &gt;  nam01.safelinks.protection.outlook.com 
nam01.safelinks.protection.outlook.com  &gt; Searchable archive at  
nam01.safelinks.protection.outlook.com nam01.safelinks.protection.outlook.com  
&gt;  ______________________________  R-sig-phylo mailing list -   
R-sig-phylo@r-project.org  stat.ethz.ch stat.ethz.ch  Searchable archive at  
www.mail-archive.com www.mail-archive.com    [[alternative HTML version 
deleted]]      ------------------------------   Message: 6  Date: Thu, 5 Sep 
2019 16:07:56 +0000  From: Cecile Ane &lt;cecile....@wisc.edu&gt;  To: R Sig 
Phylo Listserv &lt;r-sig-phylo@r-project.org&gt;  Subject: Re: [R-sig-phylo] 
model averaging using brownie.lite  Message-ID: 
&lt;080ACBDA-BAE1-48C3-B451-E02CB  Content-Type: text/plain; 
charset=&#34;utf-8&#34;   Thanks Brian, great review, as always!   To add one 
bit: this paper looks at the effective sample size that should be used for BIC, 
in the standard BM model (univariate).  projecteuclid.org projecteuclid.org   
It gives a formula that depends on the tree shape and branch lengths. Like what 
Brian said: a pectinate tree would generally have a smaller effective sample 
size than a symmetric tree, for the same number of taxa. The general formula 
uses matrix, but the result should be something less than the number of taxa, 
and greater than: # branches stemming from the root * ratio (total tree height 
/ length of shortest branch stemming from the root). The effective sample size 
should also be at least (total tree length / total tree height) for an 
ultrametric tree. See end of section 2 for an example of BIC penalties using 
effectives sample sizes.   The bottom line is the same as what Brian said:  - 
it’s generally unknown what “sample size” should be used  - in cases when we 
know, the answer is complicated (it depends on the tree and on the model).   
With multivariate data (multiple sites), the effective sample size for 
univariate data (like number of taxa or something smaller) should be multiplied 
by the number of sites, if the model assumes that sites are independent and 
share the same evolutionary parameters. (consistent with what Brian said).   
Cécile   On Sep 5, 2019, at 9:55 AM, Brian O&#39;Meara 
&lt;omeara.br...@gmail.com&lt;mailto wrote:   Sample size is a weird thing in 
this area for AICc. For comparing DNA  models in something like ModelTest, 
number of sites is used, but for OU/BM  models, we typically use number of 
taxa. It&#39;s not resolved what&#39;s best.   Posada and Buckley (2004,  
doi.org doi.org  have a  discussion on this:   Both in the AICc and the BIC 
descriptions above, the total number of  characters was used as an estimate of 
sample size. However, effective  sample sizes in phylogenetic studies are 
poorly understood, and depend on  the quantity of interest (Churchill et al., 
1992; Goldman, 1998; Morozov et  al., 2000). Characters in an alignment will 
often not be independent, so  using the total number of characters as a 
surrogate for sample size (Minin  et al., 2003; Posada and Crandall, 2001b) 
could be an overestimate. Using  only the number of variable sites as an 
estimate of sample size is a more  conservative approach, but could be an 
underestimate (note that all sites  are used when estimating base frequencies 
or the proportion of invariable  sites). Indeed, sample size also depends on 
the number of taxa.  Importantly, sample size can have an effect on the outcome 
of model  selection with the AICc. In our example above, if we were to use the 
number  of variable characters (301 sites) as the sample size, instead of the 
total  number of characters (1927 sites), the best AICc model would not change, 
 but the second and third AICc models would exchange their rankings.  
Furthermore, because the LRT, the AIC, and the BIC strategies rely on large  
sample asymptotics, it is also important to decide when a sample should be  
considered small. Although the AICc was derived under Gaussian assumptions,  
Burnham et al. (1994) found that this second order expression performed  well 
in product multinomial models for open population capture-recapture.  Burnham 
and Anderson (2003, p. 66) suggest using this correction when the  sample size 
is small compared to the number of adjustable parameters, n/K &lt;  40. 
Alternatively, and because AICc converges to the AIC with increasing  n/K 
ratios, one could always use the AICc (D. Anderson, personal  communications). 
Phylogenetic characters are mostly discrete, and the  unconstrained model in 
phylogenetics is multinomial (Goldman, 1993). One  may think of an alignment of 
nucleotide characters as a large and sparse  contingency table with 4^T bins, 
where T is the number of taxa. For large  sample asymptotics to hold in a 
contingency table every cell should  contain, in general, more than 5 
observations (see Agresti, 1990, p. 49,  244–250), which gives a rule of thumb 
of n/4^T &gt; 5. Clearly, more research  is needed on sample size in 
phylogenetics.   Beaulieu et al. (2018,  doi.org doi.org  note my COI  as 
I&#39;m an author on this) did some simulations on a codon model testing  
different ways of counting sample size (number of sites, number of taxa,  
number of sites * number of taxa, etc.) and found that number of cells in  the 
matrix (number of sites * number of taxa) seemed to work best to  approximate 
Kullback-Liebler distance. For univariate models like that used  in 
brownie.lite, number of cells is equal to number of taxa (since there&#39;s  
only one column):   We note our use of AICc, as calculated in Burnham and 
Anderson (2002, p.  66) and as opposed to the standard AIC, in the above model 
comparisons. At  the outset of our study it was unclear what the appropriate 
sample size n  is when comparing models of sequence evolution. Building upon 
the work of  Jhwueng et al. (2014), our simulations suggest that using the 
number of  taxa times the number of sites as the sample size correction 
performed best  as a small sample size correction for estimating 
Kullback–Liebler (KL)  distance in phylogenetic models (Supporting Materials). 
This also has an  intuitive appeal. In models that have at least some 
parameters shared  across sites and some parameters shared across taxa, 
increasing the number  of sites and/or taxa should be adding more samples for 
the parameters to  estimate. This is consistent considering how likelihood is 
calculated for  phylogenetic models: the likelihood for a given site is the sum 
of the  probabilities of each observed state at each tip, which is then 
multiplied  across sites. It is arguable that the conventional approach in 
comparative  methods is calculating AICc in the same way. That is, if only one 
column of  data (or “site”) is examined, as remains remarkably common in 
comparative  methods, when we refer to sample size, it is technically the 
number of taxa  multiplied by number of sites, even though it is referred to 
simply as the  number of taxa.   I suspect this is still not a great 
approximation. Compare a balanced tree  (every internal node having two 
descendants) with every internal branch  length the same versus a pectinate 
(caterpillar) tree where the two edges  connecting to the root node are very 
long and the other edges are all near  zero. For the same number of taxa and 
same number of sites, I bet the first  tree has more meaningful data: the 
pectinate tree with those branch lengths  will likely have all but one of the 
taxa having nearly identical states. So  I think tree shape and branch lengths 
should matter for this. I&#39;ve done  some preliminary analyses on this, 
building on Beaulieu et al. (2018) and  Jhwueng et al. (2014,   doi.org doi.org 
 also note  COI), but nothing definitive yet.   It&#39;s also worth looking at 
Ho and Ané (2014,  doi.org doi.org  who talk about AIC in the context  of OU 
shifts, but who get into sample size with shifts in a modified BIC  that uses 
taxa in different regimes as sample size (but again, univariate,  so maybe 
it&#39;s actually matrix size).   I also probably am missing important work by 
others -- my apologies if so.  If you know of any, please let me know (and 
probably Karla, too!).   So, in summary.... yeah, what Liam said: number of 
taxa, but it might be  more complex.   Best,  Brian   
______________________________  Brian O&#39;Meara,  brianomeara.info, 
brianomeara.info,  especially Calendar  &lt; brianomeara.info brianomeara.info  
CV  &lt; brianomeara.info brianomeara.info  and Feedback  &lt; brianomeara.info 
brianomeara.info   Professor, Dept. of Ecology &amp; Evolutionary Biology, UT 
Knoxville  Associate Head, Dept. of Ecology &amp; Evolutionary Biology, UT 
Knoxville  He/Him/His     On Thu, Sep 5, 2019 at 10:00 AM Liam Revell 
&lt;liam.rev...@umb.edu&lt;mailto:Li wrote:   Dear Karla.   In my opinion, it 
is probably correct to use the number of tips on the  tree as the sample size 
for AICc when estimating the Brownian rate: as  the number of independent 
pieces of information is n-1, just like with  an ordinary variance. For other 
parameters in phylogenetic comparative  analyses, the effective sample size may 
be different, however.   All the best, Liam   Liam J. Revell  Associate 
Professor, University of Massachusetts Boston  Profesor Asistente, Universidad 
Católica de la Ssma Concepción  web:  faculty.umb.edu faculty.umb.edu   
www.phytools.org www.phytools.org   Academic Director UMass Boston Chile Abroad 
(starting 2019):  www.umb.edu www.umb.edu   On 9/5/2019 9:49 AM, Karla Shikev 
wrote:  [EXTERNAL SENDER]   Thanks so much, Liam! Just one quick follow-up 
question: what do you  suggest should be the sample size for transforming AIC 
into AICc? the  number of tips on the tree?   Karla   On Thu, Sep 5, 2019 at 
10:27 AM Liam Revell &lt;liam.rev...@umb.edu&gt; wrote:   Dear Karla.   You 
could try &amp; create your own logLik method for the object class  
&#34;brownie.lite&#34; as follows:   ## method  
logLik.brownie.lite&lt;-function(        lik&lt;-setNames(                c(    
            c(        attr(lik,&#34;df&#34;)&lt;-c(        lik  }  ## fit model 
 fit&lt;-brownie.lite(tree,x)  ## use it  logLik(fit)  AIC(fit)   All the best, 
Liam   Liam J. Revell  Associate Professor, University of Massachusetts Boston  
Profesor Asistente, Universidad Católica de la Ssma Concepción  web:  
faculty.umb.edu faculty.umb.edu  nam01.safelinks.protection.outlook.com 
nam01.safelinks.protection.outlook.com   Academic Director UMass Boston Chile 
Abroad (starting 2019):  www.umb.edu www.umb.edu   On 9/5/2019 9:13 AM, Karla 
Shikev wrote:  [EXTERNAL SENDER]   Dear all,   I&#39;ve been trying to use 
brownie.lite to implement the tutorial  available  here (   
nam01.safelinks.protection.outlook.com nam01.safelinks.protection.outlook.com  
)  to  calculate model-averaged rates of evolution and for model selection (1  
versus 2 rates). However, the current version of phytools 0.6-99 won&#39;t  
produce AICc estimates. Does anyone know a way around this? Any help  would  be 
greatly appreciated.   thanks a bunch,   Karla          [[alternative HTML 
version deleted]]   ______________________________  R-sig-phylo mailing list -  
 R-sig-phylo@r-project.org    nam01.safelinks.protection.outlook.com 
nam01.safelinks.protection.outlook.com  Searchable archive at   
nam01.safelinks.protection.outlook.com nam01.safelinks.protection.outlook.com   
        [[alternative HTML version deleted]]   ______________________________  
R-sig-phylo mailing list -   R-sig-phylo@r-project.org   
nam01.safelinks.protection.outlook.com nam01.safelinks.protection.outlook.com  
Searchable archive at  nam01.safelinks.protection.outlook.com 
nam01.safelinks.protection.outlook.com   ______________________________  
R-sig-phylo mailing list -   R-sig-phylo@r-project.org  stat.ethz.ch 
stat.ethz.ch  Searchable archive at  www.mail-archive.com www.mail-archive.com  
  [[alternative HTML version deleted]]   ______________________________  
R-sig-phylo mailing list - R-sig-phylo@r-project.org&lt;mail  stat.ethz.ch 
stat.ethz.ch  Searchable archive at  www.mail-archive.com www.mail-archive.com  
  [[alternative HTML version deleted]]     ------------------------------   
Message: 7  Date: Thu, 5 Sep 2019 09:40:39 -0700  From: Gregory Mutumi 
&lt;gmut...@gmail.com&gt;  To: &#34;r-sig-phylo@r-project.org&#34; 
&lt;r-sig-phylo@r-project.org&gt;  Subject: [R-sig-phylo] Rate of trait 
evolution through time  Message-ID:  &lt;CAA5DsKHi8f529xxfBox0u7wa18g6  
Content-Type: text/plain; charset=&#34;utf-8&#34;   Dear All   I want to 
extract rates for about 100 bins across a phylogeny and plot the  rate through 
time. Can this be done in BTRtools or is there other packages  to handle the 
next steps?. I want something like the figure 1 b of the  paper 
&#39;Ecomorphological diversification in squamates from conserved pattern  of 
cranial integration&#39; - attached.  SuppMat_Watanabe et al 2019 
Ecomorphological di...  &lt; drive.google.com drive.google.com  Watanabe et al 
2019 Ecomorphological diversific...  &lt; drive.google.com drive.google.com   I 
managed to do this with BAMM and BAMMtools, but the problem is I can only  use 
1PC (please also find attached the plot I got).  BayesPCM1  &lt; 
drive.google.com drive.google.com  BayesPCM1.Log.txt  &lt; drive.google.com 
drive.google.com  BayesPCM1.Output.trees  &lt; drive.google.com 
drive.google.com  BayesPCM1.Schedule.txt  &lt; drive.google.com 
drive.google.com  BayesPCM1.VarRates.txt  &lt; drive.google.com 
drive.google.com   Please find attached output *&#39;BayesTrait&#39;* files for 
one of the structures  and also the txt data of the first 9 PCs. If I can get 
an idea of what  steps to take and which packages to use. I am very new to 
Bayesian analyses.   Thank you.   Yours sincerely   Gregory   -------------- 
next part --------------  An HTML attachment was scrubbed...  URL: &lt; 
stat.ethz.ch stat.ethz.ch   -------------- next part --------------  A non-text 
attachment was scrubbed...  Name: Figure 7 - updated.tif  Type: image/tiff  
Size: 304104 bytes  Desc: not available  URL: &lt; stat.ethz.ch stat.ethz.ch    
------------------------------   Subject: Digest Footer   
______________________________  R-sig-phylo mailing list   
R-sig-phylo@r-project.org  stat.ethz.ch stat.ethz.ch    
------------------------------   End of R-sig-phylo Digest, Vol 140, Issue 3  
******************************


        [[alternative HTML version deleted]]

_______________________________________________
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/

Re: [R-sig-phylo] model averaging using brownie.lite

Reply via email to