Re: [R-sig-phylo] LL ratio test
On 12-06-17 07:35 AM, Joe Felsenstein wrote: Ben Bolker wrote: I'd like to chime in here ... again ready to be corrected by others. The description above doesn't match my understanding of AIC at all (although again I will be perfectly happy, and in fact really interested, if you can point me to a reference that lays out this justification for the AIC). AIC is an estimate of the expected (Kullback-Leibler) distance between a hypothetical 'true model' and any specified model. Thanks, that is very clear. It is the correct description. But that leaves me unsatisfied. Consider, for example, the case where we have two models, M0 and M1. Suppose that they are in fact nested, with degrees of freedom d0 and d1. Now we can apply Fisher's Likelihood Ratio Test of Fisher as well as the AIC. The LRT tells us that the expectation ... under the null hypothesis where M0 (the simpler model) is true? of 2 log(L1) - 2 log(L0) is d1 - d0 (because it is distributed as a Chi-Square with that number of degrees of freedom). But the AIC tells us that the expectation is 2(d1 - d0). Maybe I'm missing something, but I don't see how the AIC tells us something about the expectation of 2 log(L1) - 2 log(L0) ? It gives us the expectation of the Kullback-Leibler distance, which is something like sum(p(i) log(p(i)/q(i)) where p(i) is the true distribution of outcomes and q(i) is the predicted distribution of outcomes ... so it's something more like a marginal log-likelihood difference rather than a maximum log-likelihood difference ... This is, I gather, not a conflict because the assumption of the AIC is that neither M0 nor M1 is correct, but instead a model M' which has an infinite number of degrees of freedom is correct. So both assertions are (formally) correct. But what if M0 was actually correct? Are we supposed to use AIC? I would say that if you want to find the _correct_ model, and you think that the correct model might be in the set of your candidate models, you ought to be using BIC/SIC. Along with Burnham and Anderson, I think that in ecological and evolutionary analysis it is very unlikely that the true model is *ever* in your set of candidate models (note that I do disagree with them on a lot of things!) ... I also understand that AIC does not give us a distribution of the test statistic, LRT does. For example, in the case of phylogenies that all have the same number of degrees of freedom, all AIC does is tell us to prefer the one with highest likelihood. Yes. People have come up with various (often misused) rules of thumb about how big an AIC difference is large (please don't say significant), but it really boils down to understanding the log-likelihood scale in a basic (non-probabilistic) way as promoted e.g. by some of the pure likelihoodists (Edwards, Royall) -- a 2-point difference on the log-likelihood scale corresponds to an eightfold difference in log-likelihood, so would be reasonable choice for a cutoff between small and large likelihood differences. People often sometimes reason that adding a single, useless parameter adds +2 to the AIC, so a difference of 2 is equivalent to less than 1 effective parameters'-worth of difference (hence small). (I think lots of people here, including you, already know this ...) If you want distributions of test statistics, I claim it makes the most sense to work in a likelihood-ratio-test-based framework. (I can imagine that it would be possible to derive asymptotic distributions for AICs, but I've never seen it done ...) Anyway, thanks for the clarification, which makes clear that the rationale of using the LRT to justify the AIC is incorrect. Joe Joe Felsenstein j...@gs.washington.edu mailto:j...@gs.washington.edu Department of Genome Sciences and Department of Biology, University of Washington, Box 355065, Seattle, WA 98195-5065 USA ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Re: [R-sig-phylo] LL ratio test
Ben Bolker -- Now we can apply Fisher's Likelihood Ratio Test of Fisher as well as the AIC. The LRT tells us that the expectation ... under the null hypothesis where M0 (the simpler model) is true? Yup. I'm considering that case to see how the AIC fits in with the LRT. Of course the AIC is proposed for a much wider range of cases. of 2 log(L1) - 2 log(L0) is d1 - d0 (because it is distributed as a Chi-Square with that number of degrees of freedom). But the AIC tells us that the expectation is 2(d1 - d0). Maybe I'm missing something, but I don't see how the AIC tells us something about the expectation of 2 log(L1) - 2 log(L0) ? It gives us the expectation of the Kullback-Leibler distance, which is something like sum(p(i) log(p(i)/q(i)) where p(i) is the true distribution of outcomes and q(i) is the predicted distribution of outcomes ... so it's something more like a marginal log-likelihood difference rather than a maximum log-likelihood difference ... Well, the AIC ends up with comparing -2 log(L) + 2d for the two hypotheses. The difference of these for models M1 and M0 is just (the negative of) 2 log(L1/L0) - 2(d1-d0).Or have I missed something here? So the expectation of the difference is log likelihood *is* described by the AIC, right? And isn't it (in view of Fisher's distribution) wrong too? That is what disturbs me and makes me feel there is something I don't understand about the AIC argument. Joe Joe Felsenstein j...@gs.washington.edu Department of Genome Sciences and Department of Biology, University of Washington, Box 355065, Seattle, WA 98195-5065 USA [[alternative HTML version deleted]] ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Re: [R-sig-phylo] LL ratio test
Hi list, I agree with's Ben's definition of AIC, expected (Kullback-Leibler) distance between a hypothetical 'true model' and any specified model, I just feel that doesn't give any intuition to a frequentist which is what I thought the question was asking. Ben, for references to this derivation I like Cavanaugh 1997http://www.sciencedirect.com/science/article/pii/S0167715296001289. The paper is actually about AIC vs AICc, but provides a nice clean derivation which shows that AIC penalty provides an asymptotically unbiased estimate, whereas the maximum likelihood estimate (MLE) alone is biased by that amount. (Consider that the MLE of the order parameter k for a polynomial is n-1; it's clear the MLE is biased). I may have missed something in my interpretation here, so happy to stand corrected. In my reply I was only hoping to show that there is a natural connection between AIC and familiar Frequentist concepts, perhaps the information terminology makes AIC sound more foreign. As Joe points out, if I'm willing to make some parametric assumptions, I can get a distribution of the AIC statistic like any other statistic. Whether that's consistent with one's philosophical beliefs is a separate issue. I personally I find the true-model to be a bit of a straw man issue; models are approximations and best depends on why you're modeling. -Carl On Sun, Jun 17, 2012 at 12:33 PM, Joe Felsenstein j...@gs.washington.eduwrote: Ben Bolker -- Now we can apply Fisher's Likelihood Ratio Test of Fisher as well as the AIC. The LRT tells us that the expectation ... under the null hypothesis where M0 (the simpler model) is true? Yup. I'm considering that case to see how the AIC fits in with the LRT. Of course the AIC is proposed for a much wider range of cases. of 2 log(L1) - 2 log(L0) is d1 - d0 (because it is distributed as a Chi-Square with that number of degrees of freedom). But the AIC tells us that the expectation is 2(d1 - d0). Maybe I'm missing something, but I don't see how the AIC tells us something about the expectation of 2 log(L1) - 2 log(L0) ? It gives us the expectation of the Kullback-Leibler distance, which is something like sum(p(i) log(p(i)/q(i)) where p(i) is the true distribution of outcomes and q(i) is the predicted distribution of outcomes ... so it's something more like a marginal log-likelihood difference rather than a maximum log-likelihood difference ... Well, the AIC ends up with comparing -2 log(L) + 2d for the two hypotheses. The difference of these for models M1 and M0 is just (the negative of) 2 log(L1/L0) - 2(d1-d0).Or have I missed something here? So the expectation of the difference is log likelihood *is* described by the AIC, right? And isn't it (in view of Fisher's distribution) wrong too? That is what disturbs me and makes me feel there is something I don't understand about the AIC argument. Joe Joe Felsenstein j...@gs.washington.edu Department of Genome Sciences and Department of Biology, University of Washington, Box 355065, Seattle, WA 98195-5065 USA [[alternative HTML version deleted]] ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo -- Carl Boettiger UC Davis http://www.carlboettiger.info/ [[alternative HTML version deleted]] ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Re: [R-sig-phylo] LL ratio test
Carl Boettiger wrote: Others on the list can weigh in with more authority, but perhaps this will get the discussion started. Yes, it's important to know whether the parameters are nested, and the issue of being at the end of a parameter range is serious. Recall that AIC values are a fequentist statistic: and they obey the very same same distribution as the likelihood ratio, (recall it is a difference log likelihoods, just shifted by the difference in the number of parameters (e.g. -2 [ log L1 - log L0 - (k1 - k0)]). Recall that the maximum likelihood estimate (MLE) is a biased estimate of the likelihood of your data and that AIC penalty is simply creating an asymptotically unbiased** estimator of the true model likelihood, which is a frequentist concept to begin with. Why we report confidence intervals/p-values in the case of one of these statistics but not the other is not obvious to me either. I will confess my relative ignorance of AIC issues (my phylogeny book has a simple, elegant, and clear explanation -- which I wrote in a hurry while excited that I finally understood this, and which turns out to make no sense whatsoever and should be firmly ignored by all). But I do know this: If we have the likelihood ratio R = L(p')/L(p) where p' is the ML parameter values and p is the true parameter values, and where p is in the interior of the set of possible parameters, then RA Fisher showed about 1922 that asymptotically with large amounts of data: 2 log(R) is distributed as chi-square with D degrees of freedom, where D is the difference of the number of parameters being estimated in p' and the number of parameters being estimated in p. Now we know that the expectation of that chi-squared variable is D. So to correct the bias in R we should subtract D. That sounds like what Carl is explaining too. It sounds like a very simple and clear explanation of the AIC. Unfortunately that subtraction is *not* what AIC does. It subtracts 2D. The reason it does so is unclear to me. It involves some kind of prior on models, I think. As far as I am concerned it is like the peace of god, in that it passeth human understanding. Maybe the experts here can give me a simple explanation. Otherwise maybe we should honor Fisher (not me) and only subtract D, and call the result the FIC, But that works only for nested hypotheses, and the main point of the AIC is to deal with non-nested hypotheses. To make matters worse, in my field the AIC has the reputation of too easily favoring the most complex hypothesis, so maybe we should be subtracting more than 2D, not less. Clueless in Seattle. Joe Joe Felsenstein j...@gs.washington.edu Department of Genome Sciences and Department of Biology, University of Washington, Box 355065, Seattle, WA 98195-5065 USA [[alternative HTML version deleted]] ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Re: [R-sig-phylo] LL ratio test
Hi Anurag, list, Others on the list can weigh in with more authority, but perhaps this will get the discussion started. I think your question gets at how the models are nested. To ensure that the likelihoods are chi-square distributed you also need to make sure that the parameter is not constrained against a limit (for instance, a Brownian process is only nested in an OU process by setting alpha to zero, and alpha is frequently constrained to be positive). In this case, you have a mixture of chi square and zero values (there's a rich literature on this, e.g. Ota et al. 2000 http://mbe.oxfordjournals.org/content/17/5/798.abstract, but for some reason it is frequently ignored). i.e. if it is constrained as in the OU/BM case, then you are only testing that the parameter estimate is significantly larger than zero and it's one-tailed, but not chi-squared. Recall that AIC values are a fequentist statistic: and they obey the very same same distribution as the likelihood ratio, (recall it is a difference log likelihoods, just shifted by the difference in the number of parameters (e.g. -2 [ log L1 - log L0 - (k1 - k0)]). Recall that the maximum likelihood estimate (MLE) is a biased estimate of the likelihood of your data and that AIC penalty is simply creating an asymptotically unbiased** estimator of the true model likelihood, which is a frequentist concept to begin with. Why we report confidence intervals/p-values in the case of one of these statistics but not the other is not obvious to me either. -Carl ** the MLE is biased in the frequentist sense, as follows: Simulate data under some true model and then evaluate the likelihood of that data under those true parameters. Now estimate the parameters from the data by maximum likelihood. This second likelihood will always be greater than or equal to the first true likelihood, making it a biased estimate (even though it converges to the true value). The corresponding AIC score should be symmetrically distributed about that true likelihood. On Wed, Jun 13, 2012 at 8:45 AM, Anurag Agrawal aa...@cornell.edu wrote: Dear physigs, I've been using likelihood ratio tests in various statistical models and have seen mixed usage of two- vs. on-tailed tests of the difference in the LL of two models. On the one hand, a one-tailed test seems reasonable because a model can only reduce the model fit if we remove a parameter... on the other hand, perhaps this is accounted for by the shape of a chi-square distribution (which is bounded by zero on the left). What should we be doing? I know I should be using AIC values, but I am having difficulty escaping the frequentist paradigm. Many thanks, -Anurag p.s. this is what Mark Pagel said a few years ago: When your test allows outcomes in either direction (plus or minus) you should set alpha in each tail at alpha/2 to have a long run expectation of making Type I errors at rate alpha. [[alternative HTML version deleted]] ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo -- Carl Boettiger UC Davis http://www.carlboettiger.info/ [[alternative HTML version deleted]] ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo