Re: [R-sig-phylo] LL ratio test

2012-06-17 Thread Ben Bolker
On 12-06-17 07:35 AM, Joe Felsenstein wrote:
 
 Ben Bolker wrote:
 
  I'd like to chime in here ... again ready to be corrected by others.
 The description above doesn't match my understanding of AIC at all
 (although again I will be perfectly happy, and in fact really
 interested, if you can point me to a reference that lays out this
 justification for the AIC).  AIC is an estimate of the expected
 (Kullback-Leibler) distance between a hypothetical 'true model' and
 any specified model. 
 
 Thanks, that is very clear.  It is the correct description.  But that
 leaves me unsatisfied.  Consider, for example, the case where
 we have two models, M0 and M1.   Suppose that they are in
 fact nested, with degrees of freedom  d0  and  d1.
 
 Now we can apply Fisher's  Likelihood Ratio Test of Fisher as well
 as the AIC.  The LRT tells us that the expectation 

  ... under the null hypothesis where M0 (the simpler model) is true?

 of   
 2 log(L1) - 2 log(L0)   is   d1 - d0   (because it is distributed as a
 Chi-Square with that number of degrees of freedom).
 
 But the AIC tells us that the expectation is  2(d1 - d0).

  Maybe I'm missing something, but I don't see how the AIC tells us
something about the expectation of 2 log(L1) - 2 log(L0) ?  It gives us
the expectation of the Kullback-Leibler distance, which is something
like sum(p(i) log(p(i)/q(i)) where p(i) is the true distribution of
outcomes and q(i) is the predicted distribution of outcomes ... so it's
something more like a marginal log-likelihood difference rather than a
maximum log-likelihood difference ...
 
 This is, I gather, not a conflict because the assumption of the
 AIC is that neither M0 nor M1 is correct, but instead a model
 M' which has an infinite number of degrees of freedom is
 correct.   So both assertions are (formally) correct. 
 
 But what if  M0  was actually correct?  Are we supposed
 to use AIC?   

  I would say that if you want to find the _correct_ model, and you
think that the correct model might be in the set of your candidate
models, you ought to be using BIC/SIC. Along with Burnham and Anderson,
I think that in ecological and evolutionary analysis it is very unlikely
that the true model is *ever* in your set of candidate models (note that
I do disagree with them on a lot of things!) ...
 
 I also understand that AIC does not give us a distribution
 of the test statistic, LRT does.   For example, in the case of
 phylogenies that all have the same number of degrees of
 freedom, all AIC does is tell us to prefer the one with
 highest likelihood.

  Yes.  People have come up with various (often misused) rules of thumb
about how big an AIC difference is large (please don't say
significant), but it really boils down to understanding the
log-likelihood scale in a basic (non-probabilistic) way as promoted e.g.
by some of the pure likelihoodists (Edwards, Royall) -- a 2-point
difference on the log-likelihood scale corresponds to an eightfold
difference in log-likelihood, so would be reasonable choice for a cutoff
between small and large likelihood differences.  People often
sometimes reason that adding a single, useless parameter adds +2 to the
AIC, so a difference of 2 is equivalent to less than 1 effective
parameters'-worth of difference (hence small).
  (I think lots of people here, including you, already know this ...)

  If you want distributions of test statistics, I claim it makes the
most sense to work in a likelihood-ratio-test-based framework.  (I can
imagine that it would be possible to derive asymptotic distributions for
AICs, but I've never seen it done ...)

 Anyway, thanks for the clarification, which makes clear
 that the rationale of using the LRT to justify the AIC
 is incorrect.
 
 Joe
 
 Joe Felsenstein j...@gs.washington.edu mailto:j...@gs.washington.edu
 Department of Genome Sciences and Department of Biology,
 University of Washington, Box 355065, Seattle, WA 98195-5065 USA
 
 


___
R-sig-phylo mailing list
R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo


Re: [R-sig-phylo] LL ratio test

2012-06-17 Thread Joe Felsenstein

Ben Bolker --

 Now we can apply Fisher's  Likelihood Ratio Test of Fisher as well
 as the AIC.  The LRT tells us that the expectation 
 
  ... under the null hypothesis where M0 (the simpler model) is true?

Yup.  I'm considering that case to see how the AIC fits in with the LRT.
Of course the AIC is proposed for a much wider range of cases.

 of   
 2 log(L1) - 2 log(L0)   is   d1 - d0   (because it is distributed as a
 Chi-Square with that number of degrees of freedom).
 
 But the AIC tells us that the expectation is  2(d1 - d0).
 
  Maybe I'm missing something, but I don't see how the AIC tells us
 something about the expectation of 2 log(L1) - 2 log(L0) ?  It gives us
 the expectation of the Kullback-Leibler distance, which is something
 like sum(p(i) log(p(i)/q(i)) where p(i) is the true distribution of
 outcomes and q(i) is the predicted distribution of outcomes ... so it's
 something more like a marginal log-likelihood difference rather than a
 maximum log-likelihood difference ...

Well, the AIC ends up with comparing   -2 log(L) + 2d  for the two
hypotheses.   The difference of these for models  M1 and M0
is just (the negative of) 2 log(L1/L0) - 2(d1-d0).Or have I
missed something here?  So the expectation of the difference
is log likelihood  *is*  described by the AIC, right?   And isn't it
(in view of Fisher's distribution) wrong too?   That is what
disturbs me and makes me feel there is something I don't
understand about the AIC argument.

Joe

Joe Felsenstein j...@gs.washington.edu
Department of Genome Sciences and Department of Biology,
University of Washington, Box 355065, Seattle, WA 98195-5065 USA




[[alternative HTML version deleted]]

___
R-sig-phylo mailing list
R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo


Re: [R-sig-phylo] LL ratio test

2012-06-17 Thread Carl Boettiger
Hi list,

I agree with's Ben's definition of AIC,  expected (Kullback-Leibler)
distance between a hypothetical 'true model' and
any specified model, I just feel that doesn't give any intuition to a
frequentist which is what I thought the question was asking.

Ben, for references to this derivation I like Cavanaugh
1997http://www.sciencedirect.com/science/article/pii/S0167715296001289.
 The paper is actually about AIC vs AICc, but provides a nice clean
derivation which shows that AIC penalty provides an asymptotically unbiased
estimate, whereas the maximum likelihood estimate (MLE) alone is biased by
that amount.  (Consider that the MLE of the order parameter k for a
polynomial is n-1; it's clear the MLE is biased).  I may have missed
something in my interpretation here, so happy to stand corrected.

In my reply I was only hoping to show that there is a natural connection
between AIC and familiar Frequentist concepts, perhaps the information
terminology makes AIC sound more foreign.  As Joe points out, if I'm
willing to make some parametric assumptions, I can get a distribution of
the AIC statistic like any other statistic.  Whether that's consistent with
one's philosophical beliefs is a separate issue.  I personally I find the
true-model to be a bit of a straw man issue; models are approximations
and best depends on why you're modeling.

-Carl



On Sun, Jun 17, 2012 at 12:33 PM, Joe Felsenstein j...@gs.washington.eduwrote:


 Ben Bolker --

  Now we can apply Fisher's  Likelihood Ratio Test of Fisher as well
  as the AIC.  The LRT tells us that the expectation
 
   ... under the null hypothesis where M0 (the simpler model) is true?

 Yup.  I'm considering that case to see how the AIC fits in with the LRT.
 Of course the AIC is proposed for a much wider range of cases.

  of
  2 log(L1) - 2 log(L0)   is   d1 - d0   (because it is distributed as a
  Chi-Square with that number of degrees of freedom).
 
  But the AIC tells us that the expectation is  2(d1 - d0).
 
   Maybe I'm missing something, but I don't see how the AIC tells us
  something about the expectation of 2 log(L1) - 2 log(L0) ?  It gives us
  the expectation of the Kullback-Leibler distance, which is something
  like sum(p(i) log(p(i)/q(i)) where p(i) is the true distribution of
  outcomes and q(i) is the predicted distribution of outcomes ... so it's
  something more like a marginal log-likelihood difference rather than a
  maximum log-likelihood difference ...

 Well, the AIC ends up with comparing   -2 log(L) + 2d  for the two
 hypotheses.   The difference of these for models  M1 and M0
 is just (the negative of) 2 log(L1/L0) - 2(d1-d0).Or have I
 missed something here?  So the expectation of the difference
 is log likelihood  *is*  described by the AIC, right?   And isn't it
 (in view of Fisher's distribution) wrong too?   That is what
 disturbs me and makes me feel there is something I don't
 understand about the AIC argument.

 Joe
 
 Joe Felsenstein j...@gs.washington.edu
 Department of Genome Sciences and Department of Biology,
 University of Washington, Box 355065, Seattle, WA 98195-5065 USA




 [[alternative HTML version deleted]]

 ___
 R-sig-phylo mailing list
 R-sig-phylo@r-project.org
 https://stat.ethz.ch/mailman/listinfo/r-sig-phylo




-- 
Carl Boettiger
UC Davis
http://www.carlboettiger.info/

[[alternative HTML version deleted]]

___
R-sig-phylo mailing list
R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo


Re: [R-sig-phylo] LL ratio test

2012-06-14 Thread Joe Felsenstein

Carl Boettiger wrote:

 Others on the list can weigh in with more authority, but perhaps this will
 get the discussion started.

Yes, it's important to know whether the parameters are nested, and
the issue of being at the end of a parameter range is serious.

 Recall that AIC values are a fequentist statistic: and they obey the very
 same same distribution as the likelihood ratio, (recall it is a difference
 log likelihoods, just shifted by the difference in the number of parameters
 (e.g. -2 [ log L1 -  log L0 - (k1 - k0)]).  Recall that the maximum
 likelihood estimate (MLE) is a biased estimate of the likelihood of your
 data and that AIC penalty is simply creating an asymptotically unbiased**
 estimator of the true model likelihood, which is a frequentist concept to
 begin with.  Why we report confidence intervals/p-values in the case of one
 of these statistics but not the other is not obvious to me either.

I will confess my relative ignorance of AIC issues (my phylogeny book
has a simple, elegant, and clear explanation -- which I wrote in a hurry 
while excited that I finally understood this, and which turns out to
make no sense whatsoever and should be firmly ignored by all).

But I do know this: If we have the likelihood ratio  R = L(p')/L(p)  where
p'  is the ML parameter values and  p  is the true parameter values,
and where  p  is in the interior of the set of possible parameters,
then RA Fisher showed about 1922 that asymptotically with large
amounts of data:

2 log(R)   is distributed as  chi-square with  D  degrees of freedom,
where  D  is the difference of the number of parameters being 
estimated in  p' and the number of parameters being estimated in
p.   Now we know that the expectation of that chi-squared variable
is  D.   So to correct the bias in  R   we should subtract  D.   That sounds
like what Carl is explaining too.

It sounds like a very simple and clear explanation of the AIC.  Unfortunately
that subtraction is *not* what AIC does.   It subtracts  2D.   The reason
it does so is unclear to me.  It involves some kind of prior on models,
I think.  As far as I am concerned it is like the peace of god, in that it
passeth human understanding.

Maybe the experts here can give me a simple explanation.  Otherwise
maybe we should honor Fisher (not me) and only subtract  D, and call the
result the FIC,  But that works only for nested hypotheses, and the main
point of the AIC is to deal with non-nested hypotheses.  To make matters
worse, in my field the AIC has the reputation of too easily favoring the
most complex hypothesis, so maybe we should be subtracting more than
2D, not less.

Clueless in Seattle.

Joe

Joe Felsenstein j...@gs.washington.edu
Department of Genome Sciences and Department of Biology,
University of Washington, Box 355065, Seattle, WA 98195-5065 USA




[[alternative HTML version deleted]]

___
R-sig-phylo mailing list
R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo


Re: [R-sig-phylo] LL ratio test

2012-06-13 Thread Carl Boettiger
Hi Anurag, list,

Others on the list can weigh in with more authority, but perhaps this will
get the discussion started.

I think your question gets at how the models are nested.  To ensure that
the likelihoods are chi-square distributed you also need to make sure that
the parameter is not constrained against a limit (for instance, a Brownian
process is only nested in an OU process by setting alpha to zero, and alpha
is frequently constrained to be positive).  In this case, you have a
mixture of chi square and zero values (there's a rich literature on this,
e.g.  Ota et al. 2000
http://mbe.oxfordjournals.org/content/17/5/798.abstract, but for some
reason it is frequently ignored).  i.e. if it is constrained as in the
OU/BM case, then you are only testing that the parameter estimate is
significantly larger than zero and it's one-tailed, but not chi-squared.

Recall that AIC values are a fequentist statistic: and they obey the very
same same distribution as the likelihood ratio, (recall it is a difference
log likelihoods, just shifted by the difference in the number of parameters
(e.g. -2 [ log L1 -  log L0 - (k1 - k0)]).  Recall that the maximum
likelihood estimate (MLE) is a biased estimate of the likelihood of your
data and that AIC penalty is simply creating an asymptotically unbiased**
estimator of the true model likelihood, which is a frequentist concept to
begin with.  Why we report confidence intervals/p-values in the case of one
of these statistics but not the other is not obvious to me either.

-Carl

** the MLE is biased in the frequentist sense, as follows: Simulate data
under some true model and then evaluate the likelihood of that data under
those true parameters.  Now estimate the parameters from the data by
maximum likelihood.  This second likelihood will always be greater than or
equal to the first true likelihood, making it a biased estimate (even
though it converges to the true value).  The corresponding AIC score should
be symmetrically distributed about that true likelihood.





On Wed, Jun 13, 2012 at 8:45 AM, Anurag Agrawal aa...@cornell.edu wrote:

 Dear physigs,
 I've been using likelihood ratio tests in various statistical models and
 have seen mixed usage of two- vs. on-tailed tests of the difference in the
 LL of two models.  On the one hand, a one-tailed test seems reasonable
 because a model can only reduce the model fit if we remove a parameter...
 on the other hand, perhaps this is accounted for by the shape of a
 chi-square distribution (which is bounded by zero on the left).
 What should we be doing? I know I should be using AIC values, but I am
 having difficulty escaping the frequentist paradigm.
 Many thanks, -Anurag

 p.s. this is what Mark Pagel said a few years ago: When your test allows
 outcomes in either direction (plus or minus) you should set alpha in each
 tail at alpha/2 to have a long run expectation of making Type I errors at
 rate alpha.


[[alternative HTML version deleted]]

 ___
 R-sig-phylo mailing list
 R-sig-phylo@r-project.org
 https://stat.ethz.ch/mailman/listinfo/r-sig-phylo




-- 
Carl Boettiger
UC Davis
http://www.carlboettiger.info/

[[alternative HTML version deleted]]

___
R-sig-phylo mailing list
R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo