On 23/09/15 13:39, John Sorkin wrote:

Charles, I am not sure the answer to me question, given a dataset,
how can one compare the fit of a model of the fits the data to a
mixture of two normal distributions to the fit of a model that uses a
single normal distribution, can be based on the glm model you
suggest.

I have used normalmixEM to fit the data to a mixture of two normal
curves. The model estimates four (perhaps five) parameters: mu1, sd^2
1, mu2, sd^2, (and perhaps lambda, the mixing proportion. The mixing
proportion may not need to be estimated, it may be determined once
once specifies mu1, sd^2 1, mu2, and sd^2.) Your model fits the data
to a model that contains only the mean, and estimates 2 parameters
mu0 and sd0^2.  I am not sure that your model and mine can be
considered to be nested. If I am correct I can't compare the log
likelihood values from the two models. I  may be wrong. If I am, I
should be able to perform a log likelihood test with 2 (or 3, I am
not sure which) DFs. Are you suggesting the models are nested? If so,
should I use 3 or 2 DFs?

You are quite correct; there are subtleties involved here.

The one-component model *is* nested in the two-component model, but is nested "ambiguously".

(1) The null (single component) model for a mixture distribution is ill-defined. Note that a single component could be achieved either by setting the mixing probabilities equal to (1,0) or (0,1) or by setting
mu_1 = mu_2 and sigma_1 = sigma_2.


(2) However you slice it, the parameter values corresponding to the null model fall on the *boundary* of the parameter space.

(3) Consequently the asymptotics go to hell in a handcart and the likelihood ratio statistic, however you specify the null model, does not have an asymptotic chi-squared distribution.

(4) I have a vague idea that there are ways of obtaining a valid asymptotic null distribution for the LRT but I am not sufficiently knowledgeable to provide any guidance here.

(5) You might be able to gain some insight from delving into the literature --- a reasonable place to start would be with "Finite Mixture Models" by McLachlan and Peel:

@book{mclachlan2000finite,
  title={Finite Mixture Models, Wiley Series in
         Probability and Statistics},
  author={McLachlan, G and Peel, D},
  year={2000},
  publisher={John Wiley \& Sons, New York}
}

(6) My own approach would be to do "parametric bootstrapping":

* fit (to the real data) the null model and calculate
  the log-likelihood L1, any way you like
* fit the full model and determine the log-likelihood L2
* form the test statistic LRT = 2*(L2 - L1)
* simulate data sets from the fitted parameters for the null model
* for each such simulate data set calculate a test statistic in the
  foregoing manner, obtaining LRT^*_1, ..., LRT^*_N
* the p-value for your test is then

  p = (m+1)/(N+1)

  where m = the number of LRT^*_i values that greater than LRT

The factor of 2 is of course completely unnecessary. I just put it in "by analogy" with the "real", usual, likelihood ratio statistic.

Note that this p-value is *exact* (not an approximation!) --- for any value of N --- when interpreted with respect to the "total observation procedure" of observing both the real and simulated data. (But see below.) That is, the probability, under the null hypothesis, of observing a test statistic "as extreme as" what you actually observed is *exactly* (m+1)/(N+1). See e.g.:

@article{Barnard1963,
author = {G. A. Barnard},
title = {Discussion of ``{T}he spectral analysis of point processes'' by {M}. {S}. {B}artlett},
journal = {J. Royal Statist. Soc.},
series  = {B},
volume  = {25},
year = {1963},
pages = {294}
}

or

@article{Hope1968,
author =  {A.C.A. Hope},
title =  {A simplified {M}onte {C}arlo significance test procedure},
journal =  {Journal of the Royal Statistical Society, series {B}},
year =  1968,
volume = 30,
pages = {582--598}
}

Taking N=99 (or 999) is arithmetically convenient.

However I exaggerate when I say that the p-value is exact. It would be exact if you *knew* the parameters of the null model. Since you have to estimate these parameters the test is (a bit?) conservative. Note that the conservatism would be present even if you eschewed the "exact" test and an "approximate" test using a (very) large value of N.

Generally conservatism (in this context! :-) ) is deemed to be no bad thing.

cheers,

Rolf Turner

P. S. I think that the mixing parameter must *always* be estimated. I.e. even if you knew mu_1, mu_2, sigma_1 and sigma_2 you would still have to estimate "lambda". So you have 5 parameters in your full model. Not that this is particularly relevant.

R. T.

--
Technical Editor ANZJS
Department of Statistics
University of Auckland
Phone: +64-9-373-7599 ext. 88276

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to