Sample size is a weird thing in this area for AICc. For comparing DNA models in something like ModelTest, number of sites is used, but for OU/BM models, we typically use number of taxa. It's not resolved what's best.
Posada and Buckley (2004, https://doi.org/10.1080/10635150490522304) have a discussion on this: Both in the AICc and the BIC descriptions above, the total number of characters was used as an estimate of sample size. However, effective sample sizes in phylogenetic studies are poorly understood, and depend on the quantity of interest (Churchill et al., 1992; Goldman, 1998; Morozov et al., 2000). Characters in an alignment will often not be independent, so using the total number of characters as a surrogate for sample size (Minin et al., 2003; Posada and Crandall, 2001b) could be an overestimate. Using only the number of variable sites as an estimate of sample size is a more conservative approach, but could be an underestimate (note that all sites are used when estimating base frequencies or the proportion of invariable sites). Indeed, sample size also depends on the number of taxa. Importantly, sample size can have an effect on the outcome of model selection with the AICc. In our example above, if we were to use the number of variable characters (301 sites) as the sample size, instead of the total number of characters (1927 sites), the best AICc model would not change, but the second and third AICc models would exchange their rankings. Furthermore, because the LRT, the AIC, and the BIC strategies rely on large sample asymptotics, it is also important to decide when a sample should be considered small. Although the AICc was derived under Gaussian assumptions, Burnham et al. (1994) found that this second order expression performed well in product multinomial models for open population capture-recapture. Burnham and Anderson (2003, p. 66) suggest using this correction when the sample size is small compared to the number of adjustable parameters, n/K < 40. Alternatively, and because AICc converges to the AIC with increasing n/K ratios, one could always use the AICc (D. Anderson, personal communications). Phylogenetic characters are mostly discrete, and the unconstrained model in phylogenetics is multinomial (Goldman, 1993). One may think of an alignment of nucleotide characters as a large and sparse contingency table with 4^T bins, where T is the number of taxa. For large sample asymptotics to hold in a contingency table every cell should contain, in general, more than 5 observations (see Agresti, 1990, p. 49, 244–250), which gives a rule of thumb of n/4^T > 5. Clearly, more research is needed on sample size in phylogenetics. Beaulieu et al. (2018, https://doi.org/10.1093/molbev/msy222; note my COI as I'm an author on this) did some simulations on a codon model testing different ways of counting sample size (number of sites, number of taxa, number of sites * number of taxa, etc.) and found that number of cells in the matrix (number of sites * number of taxa) seemed to work best to approximate Kullback-Liebler distance. For univariate models like that used in brownie.lite, number of cells is equal to number of taxa (since there's only one column): We note our use of AICc, as calculated in Burnham and Anderson (2002, p. 66) and as opposed to the standard AIC, in the above model comparisons. At the outset of our study it was unclear what the appropriate sample size n is when comparing models of sequence evolution. Building upon the work of Jhwueng et al. (2014), our simulations suggest that using the number of taxa times the number of sites as the sample size correction performed best as a small sample size correction for estimating Kullback–Liebler (KL) distance in phylogenetic models (Supporting Materials). This also has an intuitive appeal. In models that have at least some parameters shared across sites and some parameters shared across taxa, increasing the number of sites and/or taxa should be adding more samples for the parameters to estimate. This is consistent considering how likelihood is calculated for phylogenetic models: the likelihood for a given site is the sum of the probabilities of each observed state at each tip, which is then multiplied across sites. It is arguable that the conventional approach in comparative methods is calculating AICc in the same way. That is, if only one column of data (or “site”) is examined, as remains remarkably common in comparative methods, when we refer to sample size, it is technically the number of taxa multiplied by number of sites, even though it is referred to simply as the number of taxa. I suspect this is still not a great approximation. Compare a balanced tree (every internal node having two descendants) with every internal branch length the same versus a pectinate (caterpillar) tree where the two edges connecting to the root node are very long and the other edges are all near zero. For the same number of taxa and same number of sites, I bet the first tree has more meaningful data: the pectinate tree with those branch lengths will likely have all but one of the taxa having nearly identical states. So I think tree shape and branch lengths should matter for this. I've done some preliminary analyses on this, building on Beaulieu et al. (2018) and Jhwueng et al. (2014, https://doi.org/10.1515/sagmb-2013-0048, also note COI), but nothing definitive yet. It's also worth looking at Ho and Ané (2014, https://doi.org/10.1111/2041-210X.12285) who talk about AIC in the context of OU shifts, but who get into sample size with shifts in a modified BIC that uses taxa in different regimes as sample size (but again, univariate, so maybe it's actually matrix size). I also probably am missing important work by others -- my apologies if so. If you know of any, please let me know (and probably Karla, too!). So, in summary.... yeah, what Liam said: number of taxa, but it might be more complex. Best, Brian _______________________________________________________________________ Brian O'Meara, http://brianomeara.info, especially Calendar <http://brianomeara.info/calendar.html>, CV <http://brianomeara.info/cv.html>, and Feedback <http://brianomeara.info/feedback.html> Professor, Dept. of Ecology & Evolutionary Biology, UT Knoxville Associate Head, Dept. of Ecology & Evolutionary Biology, UT Knoxville He/Him/His On Thu, Sep 5, 2019 at 10:00 AM Liam Revell <liam.rev...@umb.edu> wrote: > Dear Karla. > > In my opinion, it is probably correct to use the number of tips on the > tree as the sample size for AICc when estimating the Brownian rate: as > the number of independent pieces of information is n-1, just like with > an ordinary variance. For other parameters in phylogenetic comparative > analyses, the effective sample size may be different, however. > > All the best, Liam > > Liam J. Revell > Associate Professor, University of Massachusetts Boston > Profesor Asistente, Universidad Católica de la Ssma Concepción > web: http://faculty.umb.edu/liam.revell/, http://www.phytools.org > > Academic Director UMass Boston Chile Abroad (starting 2019): > https://www.umb.edu/academics/caps/international/biology_chile > > On 9/5/2019 9:49 AM, Karla Shikev wrote: > > [EXTERNAL SENDER] > > > > Thanks so much, Liam! Just one quick follow-up question: what do you > > suggest should be the sample size for transforming AIC into AICc? the > > number of tips on the tree? > > > > Karla > > > > On Thu, Sep 5, 2019 at 10:27 AM Liam Revell <liam.rev...@umb.edu> wrote: > > > >> Dear Karla. > >> > >> You could try & create your own logLik method for the object class > >> "brownie.lite" as follows: > >> > >> ## method > >> logLik.brownie.lite<-function(object,...){ > >> lik<-setNames( > >> c(object$logL1,object$logL.multiple), > >> c("single-rate","multi-rate")) > >> attr(lik,"df")<-c(object$k1,object$k2) > >> lik > >> } > >> ## fit model > >> fit<-brownie.lite(tree,x) > >> ## use it > >> logLik(fit) > >> AIC(fit) > >> > >> All the best, Liam > >> > >> Liam J. Revell > >> Associate Professor, University of Massachusetts Boston > >> Profesor Asistente, Universidad Católica de la Ssma Concepción > >> web: http://faculty.umb.edu/liam.revell/, > https://nam01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.phytools.org&data=02%7C01%7Cliam.revell%40umb.edu%7C04607945a9f74968c14e08d73207f67f%7Cb97188711ee94425953c1ace1373eb38%7C0%7C0%7C637032882107478464&sdata=ofsem4h4SNk6g6QFUwD%2BJKO3TsTArNfH9%2BAyYDEjCvY%3D&reserved=0 > >> > >> Academic Director UMass Boston Chile Abroad (starting 2019): > >> https://www.umb.edu/academics/caps/international/biology_chile > >> > >> On 9/5/2019 9:13 AM, Karla Shikev wrote: > >>> [EXTERNAL SENDER] > >>> > >>> Dear all, > >>> > >>> I've been trying to use brownie.lite to implement the tutorial > available > >>> here ( > >> > https://nam01.safelinks.protection.outlook.com/?url=http%3A%2F%2Ftreethinkers.org%2Ftutorials%2Fmorphological-evolution-in-r%2F&data=02%7C01%7Cliam.revell%40umb.edu%7C04607945a9f74968c14e08d73207f67f%7Cb97188711ee94425953c1ace1373eb38%7C0%7C0%7C637032882107478464&sdata=64k6WMtazzmyn0SLRrx2wEA%2F2wkk3%2B%2F3dBS0HtjlUT8%3D&reserved=0 > ) > >> to > >>> calculate model-averaged rates of evolution and for model selection (1 > >>> versus 2 rates). However, the current version of phytools 0.6-99 won't > >>> produce AICc estimates. Does anyone know a way around this? Any help > >> would > >>> be greatly appreciated. > >>> > >>> thanks a bunch, > >>> > >>> Karla > >>> > >>> [[alternative HTML version deleted]] > >>> > >>> _______________________________________________ > >>> R-sig-phylo mailing list - R-sig-phylo@r-project.org > >>> > >> > https://nam01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fstat.ethz.ch%2Fmailman%2Flistinfo%2Fr-sig-phylo&data=02%7C01%7Cliam.revell%40umb.edu%7C04607945a9f74968c14e08d73207f67f%7Cb97188711ee94425953c1ace1373eb38%7C0%7C0%7C637032882107478464&sdata=ZZxjUW5cV1gb9De3yOjb54RCNlFv2WHWr01lnaeEf54%3D&reserved=0 > >>> Searchable archive at > >> > https://nam01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.mail-archive.com%2Fr-sig-phylo%40r-project.org%2F&data=02%7C01%7Cliam.revell%40umb.edu%7C04607945a9f74968c14e08d73207f67f%7Cb97188711ee94425953c1ace1373eb38%7C0%7C0%7C637032882107488458&sdata=NUqbn4Yz9gYilJAs7K2mW%2BIANK1%2FmXcpvuIo0Q0h0hw%3D&reserved=0 > >>> > >> > > > > [[alternative HTML version deleted]] > > > > _______________________________________________ > > R-sig-phylo mailing list - R-sig-phylo@r-project.org > > > https://nam01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fstat.ethz.ch%2Fmailman%2Flistinfo%2Fr-sig-phylo&data=02%7C01%7Cliam.revell%40umb.edu%7C04607945a9f74968c14e08d73207f67f%7Cb97188711ee94425953c1ace1373eb38%7C0%7C0%7C637032882107488458&sdata=S0vvcWinbTdWb4T%2BwD9Fk7gFn6gdhpycbArMGgd7cYI%3D&reserved=0 > > Searchable archive at > https://nam01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.mail-archive.com%2Fr-sig-phylo%40r-project.org%2F&data=02%7C01%7Cliam.revell%40umb.edu%7C04607945a9f74968c14e08d73207f67f%7Cb97188711ee94425953c1ace1373eb38%7C0%7C0%7C637032882107488458&sdata=NUqbn4Yz9gYilJAs7K2mW%2BIANK1%2FmXcpvuIo0Q0h0hw%3D&reserved=0 > > > _______________________________________________ > R-sig-phylo mailing list - R-sig-phylo@r-project.org > https://stat.ethz.ch/mailman/listinfo/r-sig-phylo > Searchable archive at > http://www.mail-archive.com/r-sig-phylo@r-project.org/ > [[alternative HTML version deleted]] _______________________________________________ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/