Re: [R] Compare two normal to one normal

2015-09-23 Thread Charles C. Berry

On Tue, 22 Sep 2015, John Sorkin wrote:


Charles,


I am not sure the answer to me question, given a dataset, how can one 
compare the fit of a model of the fits the data to a mixture of two 
normal distributions to the fit of a model that uses a single normal 
distribution, can be based on the glm model you suggest.


Well you *did* ask how to calculate the log-likelihood of a fitted normal 
density, didn't you? That is what I responded to. You can check that 
result longhand as sum( dnorm( y, y.mean, y.std , log=TRUE ) ) and get the 
same result (as long as you used ML estimates of the mean and standard 
deviation).





I have used normalmixEM to fit the data to a mixture of two normal 
curves. The model estimates four (perhaps five) parameters: mu1, sd^2 1, 
mu2, sd^2, (and perhaps lambda, the mixing proportion. The mixing 
proportion may not need to be estimated, it may be determined once once 
specifies mu1, sd^2 1, mu2, and sd^2.) Your model fits the data to a 
model that contains only the mean, and estimates 2 parameters mu0 and 
sd0^2.  I am not sure that your model and mine can be considered to be 
nested. If I am correct I can't compare the log likelihood values from 
the two models. I may be wrong. If I am, I should be able to perform a 
log likelihood test with 2 (or 3, I am not sure which) DFs. Are you 
suggesting the models are nested? If so, should I use 3 or 2 DFs?


As Rolf points out there is a literature on such tests (and Googling 'test 
finite mixture' covers much of it).


Do you really want a test? If you merely want to pick a winner from two 
candidate models there are other procedures. k-fold crossvalidation 
of the loglikelihood ratio statistic seems like an easy, natural approach.


HTH,

Chuck

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Compare two normal to one normal

2015-09-22 Thread Rolf Turner


On 23/09/15 13:39, John Sorkin wrote:


Charles, I am not sure the answer to me question, given a dataset,
how can one compare the fit of a model of the fits the data to a
mixture of two normal distributions to the fit of a model that uses a
single normal distribution, can be based on the glm model you
suggest.

I have used normalmixEM to fit the data to a mixture of two normal
curves. The model estimates four (perhaps five) parameters: mu1, sd^2
1, mu2, sd^2, (and perhaps lambda, the mixing proportion. The mixing
proportion may not need to be estimated, it may be determined once
once specifies mu1, sd^2 1, mu2, and sd^2.) Your model fits the data
to a model that contains only the mean, and estimates 2 parameters
mu0 and sd0^2.  I am not sure that your model and mine can be
considered to be nested. If I am correct I can't compare the log
likelihood values from the two models. I  may be wrong. If I am, I
should be able to perform a log likelihood test with 2 (or 3, I am
not sure which) DFs. Are you suggesting the models are nested? If so,
should I use 3 or 2 DFs?


You are quite correct; there are subtleties involved here.

The one-component model *is* nested in the two-component model, but is 
nested "ambiguously".


(1) The null (single component) model for a mixture distribution is 
ill-defined.  Note that a single component could be achieved either by 
setting the mixing probabilities equal to (1,0) or (0,1) or by setting

mu_1 = mu_2 and sigma_1 = sigma_2.


(2) However you slice it, the parameter values corresponding to the null 
model fall on the *boundary* of the parameter space.


(3) Consequently the asymptotics go to hell in a handcart and the 
likelihood ratio statistic, however you specify the null model, does not 
have an asymptotic chi-squared distribution.


(4) I have a vague idea that there are ways of obtaining a valid 
asymptotic null distribution for the LRT but I am not sufficiently 
knowledgeable to provide any guidance here.


(5) You might be able to gain some insight from delving into the 
literature --- a reasonable place to start would be with "Finite Mixture 
Models" by McLachlan and Peel:


@book{mclachlan2000finite,
  title={Finite Mixture Models, Wiley Series in
 Probability and Statistics},
  author={McLachlan, G and Peel, D},
  year={2000},
  publisher={John Wiley \& Sons, New York}
}

(6) My own approach would be to do "parametric bootstrapping":

* fit (to the real data) the null model and calculate
  the log-likelihood L1, any way you like
* fit the full model and determine the log-likelihood L2
* form the test statistic LRT = 2*(L2 - L1)
* simulate data sets from the fitted parameters for the null model
* for each such simulate data set calculate a test statistic in the
  foregoing manner, obtaining LRT^*_1, ..., LRT^*_N
* the p-value for your test is then

  p = (m+1)/(N+1)

  where m = the number of LRT^*_i values that greater than LRT

The factor of 2 is of course completely unnecessary.  I just put it in 
"by analogy" with the "real", usual, likelihood ratio statistic.


Note that this p-value is *exact* (not an approximation!) --- for any 
value of N --- when interpreted with respect to the "total observation
procedure" of observing both the real and simulated data.  (But see 
below.) That is, the probability, under the null hypothesis, of 
observing a test statistic "as extreme as" what you actually observed is 
*exactly* (m+1)/(N+1).  See e.g.:


@article{Barnard1963,
author = {G. A. Barnard},
title  = {Discussion of ``{T}he spectral analysis of point processes'' 
by {M}. {S}. {B}artlett},

journal = {J. Royal Statist. Soc.},
series  = {B},
volume  = {25},
year = {1963},
pages = {294}
}

or

@article{Hope1968,
author =  {A.C.A. Hope},
title =  {A simplified {M}onte {C}arlo significance test procedure},
journal =  {Journal of the Royal Statistical Society, series {B}},
year =  1968,
volume = 30,
pages = {582--598}
}

Taking N=99 (or 999) is arithmetically convenient.

However I exaggerate when I say that the p-value is exact.  It would be 
exact if you *knew* the parameters of the null model.  Since you have to 
estimate these parameters the test is (a bit?) conservative.  Note that 
the conservatism would be present even if you eschewed the "exact" test 
and an "approximate" test using a (very) large value of N.


Generally conservatism (in this context! :-) ) is deemed to be no bad thing.

cheers,

Rolf Turner

P. S.  I think that the mixing parameter must *always* be estimated. 
I.e. even if you knew mu_1, mu_2, sigma_1 and sigma_2 you would still 
have to estimate "lambda".  So you have 5 parameters in your full model. 
 Not that this is particularly relevant.


R. T.

--
Technical Editor ANZJS
Department of Statistics
University of Auckland
Phone: +64-9-373-7599 ext. 88276

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the 

Re: [R] Compare two normal to one normal

2015-09-22 Thread John Sorkin
Charles,
I am not sure the answer to me question, given a dataset, how can one compare 
the fit of a model of the fits the data to a mixture of two normal 
distributions to the fit of a model that uses a single normal distribution, can 
be based on the glm model you suggest. 


I have used normalmixEM to fit the data to a mixture of two normal curves. The 
model estimates four (perhaps five) parameters: mu1, sd^2 1, mu2, sd^2, (and 
perhaps lambda, the mixing proportion. The mixing proportion may not need to be 
estimated, it may be determined once once specifies mu1, sd^2 1, mu2, and 
sd^2.) Your model fits the data to a model that contains only the mean, and 
estimates 2 parameters mu0 and sd0^2.  I am not sure that your model and mine 
can be considered to be nested. If I am correct I can't compare the log 
likelihood values from the two models. I  may be wrong. If I am, I should be 
able to perform a log likelihood test with 2 (or 3, I am not sure which) DFs. 
Are you suggesting the models are nested? If so, should I use 3 or 2 DFs?


May thanks,
John





John David Sorkin M.D., Ph.D.
Professor of Medicine
Chief, Biostatistics and Informatics
University of Maryland School of Medicine Division of Gerontology and Geriatric 
Medicine
Baltimore VA Medical Center
10 North Greene Street
GRECC (BT/18/GR)
Baltimore, MD 21201-1524
(Phone) 410-605-7119
(Fax) 410-605-7913 (Please call phone number above prior to faxing) 

>>> "Charles C. Berry"  09/22/15 6:23 PM >>>
On Tue, 22 Sep 2015, John Sorkin wrote:

>
> In any event, I still don't know how to fit a single normal distribution 
> and get a measure of fit e.g. log likelihood.
>

Gotta love R:

> y <- rnorm(10)
> logLik(glm(y~1))
'log Lik.' -17.36071 (df=2)

HTH,

Chuck





Confidentiality Statement:
This email message, including any attachments, is for the sole use of the 
intended recipient(s) and may contain confidential and privileged information. 
Any unauthorized use, disclosure or distribution is prohibited. If you are not 
the intended recipient, please contact the sender by reply email and destroy 
all copies of the original message. 
__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Compare two normal to one normal

2015-09-22 Thread Bert Gunter
Two normals will **always** be a better fit than one, as the latter
must be a subset of the former (with identical parameters for both
normals).

Cheers,
Bert


Bert Gunter

"Data is not information. Information is not knowledge. And knowledge
is certainly not wisdom."
   -- Clifford Stoll


On Tue, Sep 22, 2015 at 1:21 PM, John Sorkin
 wrote:
> I have data that may be the mixture of two normal distributions (one 
> contained within the other) vs. a single normal.
> I used normalmixEM to get estimates of parameters assuming two normals:
>
>
> GLUT <- scale(na.omit(data[,"FCW_glut"]))
> GLUT
> mixmdl = normalmixEM(GLUT,k=1,arbmean=TRUE)
> summary(mixmdl)
> plot(mixmdl,which=2)
> lines(density(data[,"GLUT"]), lty=2, lwd=2)
>
>
>
>
>
> summary of normalmixEM object:
>comp 1   comp 2
> lambda  0.7035179 0.296482
> mu -0.0592302 0.140545
> sigma   1.1271620 0.536076
> loglik at estimate:  -110.8037
>
>
>
> I would like to see if the two normal distributions are a better fit that one 
> normal. I have two problems
> (1) normalmixEM does not seem to what to fit a single normal (even if I 
> address the error message produced):
>
>
>> mixmdl = normalmixEM(GLUT,k=1)
> Error in normalmix.init(x = x, lambda = lambda, mu = mu, s = sigma, k = k,  :
>   arbmean and arbvar cannot both be FALSE
>> mixmdl = normalmixEM(GLUT,k=1,arbmean=TRUE)
> Error in normalmix.init(x = x, lambda = lambda, mu = mu, s = sigma, k = k,  :
>   arbmean and arbvar cannot both be FALSE
>
>
>
> (2) Even if I had the loglik from a single normal, I am not sure how many DFs 
> to use when computing the -2LL ratio test.
>
>
> Any suggestions for comparing the two-normal vs. one normal distribution 
> would be appreciated.
>
>
> Thanks
> John
>
>
>
>
>
>
>
>
>
> John David Sorkin M.D., Ph.D.
> Professor of Medicine
> Chief, Biostatistics and Informatics
> University of Maryland School of Medicine Division of Gerontology and 
> Geriatric Medicine
> Baltimore VA Medical Center
> 10 North Greene Street
> GRECC (BT/18/GR)
> Baltimore, MD 21201-1524
> (Phone) 410-605-7119
> (Fax) 410-605-7913 (Please call phone number above prior to faxing)
>
>
> Confidentiality Statement:
> This email message, including any attachments, is for ...{{dropped:12}}

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Compare two normal to one normal

2015-09-22 Thread Mark Leeds
That's true but if he uses some AIC or BIC criterion that penalizes the
number of parameters,
then he might see something else ? This ( comparing mixtures to not
mixtures ) is not something I deal with so I'm just throwing it out there.




On Tue, Sep 22, 2015 at 4:30 PM, Bert Gunter  wrote:

> Two normals will **always** be a better fit than one, as the latter
> must be a subset of the former (with identical parameters for both
> normals).
>
> Cheers,
> Bert
>
>
> Bert Gunter
>
> "Data is not information. Information is not knowledge. And knowledge
> is certainly not wisdom."
>-- Clifford Stoll
>
>
> On Tue, Sep 22, 2015 at 1:21 PM, John Sorkin
>  wrote:
> > I have data that may be the mixture of two normal distributions (one
> contained within the other) vs. a single normal.
> > I used normalmixEM to get estimates of parameters assuming two normals:
> >
> >
> > GLUT <- scale(na.omit(data[,"FCW_glut"]))
> > GLUT
> > mixmdl = normalmixEM(GLUT,k=1,arbmean=TRUE)
> > summary(mixmdl)
> > plot(mixmdl,which=2)
> > lines(density(data[,"GLUT"]), lty=2, lwd=2)
> >
> >
> >
> >
> >
> > summary of normalmixEM object:
> >comp 1   comp 2
> > lambda  0.7035179 0.296482
> > mu -0.0592302 0.140545
> > sigma   1.1271620 0.536076
> > loglik at estimate:  -110.8037
> >
> >
> >
> > I would like to see if the two normal distributions are a better fit
> that one normal. I have two problems
> > (1) normalmixEM does not seem to what to fit a single normal (even if I
> address the error message produced):
> >
> >
> >> mixmdl = normalmixEM(GLUT,k=1)
> > Error in normalmix.init(x = x, lambda = lambda, mu = mu, s = sigma, k =
> k,  :
> >   arbmean and arbvar cannot both be FALSE
> >> mixmdl = normalmixEM(GLUT,k=1,arbmean=TRUE)
> > Error in normalmix.init(x = x, lambda = lambda, mu = mu, s = sigma, k =
> k,  :
> >   arbmean and arbvar cannot both be FALSE
> >
> >
> >
> > (2) Even if I had the loglik from a single normal, I am not sure how
> many DFs to use when computing the -2LL ratio test.
> >
> >
> > Any suggestions for comparing the two-normal vs. one normal distribution
> would be appreciated.
> >
> >
> > Thanks
> > John
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > John David Sorkin M.D., Ph.D.
> > Professor of Medicine
> > Chief, Biostatistics and Informatics
> > University of Maryland School of Medicine Division of Gerontology and
> Geriatric Medicine
> > Baltimore VA Medical Center
> > 10 North Greene Street
> > GRECC (BT/18/GR)
> > Baltimore, MD 21201-1524
> > (Phone) 410-605-7119
> > (Fax) 410-605-7913 (Please call phone number above prior to faxing)
> >
> >
> > Confidentiality Statement:
> > This email message, including any attachments, is for ...{{dropped:12}}
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Compare two normal to one normal

2015-09-22 Thread John Sorkin
Bert,Better, perhaps, but will something like the LR test be significant? 
Adding an extra parameter to a linear regression almost always improves the R2, 
the if one compares models, the model with the extra parameter is not always 
significantly better.
John
P.S. Please forgive the appeal to "significantly better" . . .


John David Sorkin M.D., Ph.D.
Professor of Medicine
Chief, Biostatistics and Informatics
University of Maryland School of Medicine Division of Gerontology and Geriatric 
Medicine
Baltimore VA Medical Center
10 North Greene Street
GRECC (BT/18/GR)
Baltimore, MD 21201-1524
(Phone) 410-605-7119
(Fax) 410-605-7913 (Please call phone number above prior to faxing) 

>>> Bert Gunter  09/22/15 4:30 PM >>>
Two normals will **always** be a better fit than one, as the latter
must be a subset of the former (with identical parameters for both
normals).

Cheers,
Bert


Bert Gunter

"Data is not information. Information is not knowledge. And knowledge
is certainly not wisdom."
   -- Clifford Stoll


On Tue, Sep 22, 2015 at 1:21 PM, John Sorkin
 wrote:
> I have data that may be the mixture of two normal distributions (one 
> contained within the other) vs. a single normal.
> I used normalmixEM to get estimates of parameters assuming two normals:
>
>
> GLUT <- scale(na.omit(data[,"FCW_glut"]))
> GLUT
> mixmdl = normalmixEM(GLUT,k=1,arbmean=TRUE)
> summary(mixmdl)
> plot(mixmdl,which=2)
> lines(density(data[,"GLUT"]), lty=2, lwd=2)
>
>
>
>
>
> summary of normalmixEM object:
>comp 1   comp 2
> lambda  0.7035179 0.296482
> mu -0.0592302 0.140545
> sigma   1.1271620 0.536076
> loglik at estimate:  -110.8037
>
>
>
> I would like to see if the two normal distributions are a better fit that one 
> normal. I have two problems
> (1) normalmixEM does not seem to what to fit a single normal (even if I 
> address the error message produced):
>
>
>> mixmdl = normalmixEM(GLUT,k=1)
> Error in normalmix.init(x = x, lambda = lambda, mu = mu, s = sigma, k = k,  :
>   arbmean and arbvar cannot both be FALSE
>> mixmdl = normalmixEM(GLUT,k=1,arbmean=TRUE)
> Error in normalmix.init(x = x, lambda = lambda, mu = mu, s = sigma, k = k,  :
>   arbmean and arbvar cannot both be FALSE
>
>
>
> (2) Even if I had the loglik from a single normal, I am not sure how many DFs 
> to use when computing the -2LL ratio test.
>
>
> Any suggestions for comparing the two-normal vs. one normal distribution 
> would be appreciated.
>
>
> Thanks
> John
>
>
>
>
>
>
>
>
>
> John David Sorkin M.D., Ph.D.
> Professor of Medicine
> Chief, Biostatistics and Informatics
> University of Maryland School of Medicine Division of Gerontology and 
> Geriatric Medicine
> Baltimore VA Medical Center
> 10 North Greene Street
> GRECC (BT/18/GR)
> Baltimore, MD 21201-1524
> (Phone) 410-605-7119410-605-7119
> (Fax) 410-605-7913 (Please call phone number above prior to faxing)
>
>
> Confidentiality Statement:
> This email message, including any attachments, is for the sole use of the 
> intended recipient(s) and may contain confidential and privileged 
> information. Any unauthorized use, disclosure or distribution is prohibited. 
> If you are not the intended recipient, please contact the sender by reply 
> email and destroy all copies of the original message.
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.


Call
Send SMS
Call from mobile
Add to Skype
You'll need Skype CreditFree via Skype


Confidentiality Statement:
This email message, including any attachments, is for the sole use of the 
intended recipient(s) and may contain confidential and privileged information. 
Any unauthorized use, disclosure or distribution is prohibited. If you are not 
the intended recipient, please contact the sender by reply email and destroy 
all copies of the original message. 
__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Compare two normal to one normal

2015-09-22 Thread John Sorkin
I am not sure AIC or BIC would be needed as the two normal distribution has at 
least two additional parameters to estimate; mean 1, var1, mean 2, var 2 where 
as the one normal has to estimate only var1 and var2.In any event, I don't know 
how to fit the single normal and get values for the loglik let alone AIC or BIC
John



John David Sorkin M.D., Ph.D.
Professor of Medicine
Chief, Biostatistics and Informatics
University of Maryland School of Medicine Division of Gerontology and Geriatric 
Medicine
Baltimore VA Medical Center
10 North Greene Street
GRECC (BT/18/GR)
Baltimore, MD 21201-1524
(Phone) 410-605-7119
(Fax) 410-605-7913 (Please call phone number above prior to faxing) 

>>> Mark Leeds  09/22/15 4:36 PM >>>
That's true but if he uses some AIC or BIC criterion that penalizes the number 
of parameters,

then he might see something else ? This ( comparing mixtures to not mixtures ) 
is not something I deal with so I'm just throwing it out there.






On Tue, Sep 22, 2015 at 4:30 PM, Bert Gunter  wrote:
Two normals will **always** be a better fit than one, as the latter
 must be a subset of the former (with identical parameters for both
 normals).
 
 Cheers,
 Bert
 
 
 Bert Gunter
 
 "Data is not information. Information is not knowledge. And knowledge
 is certainly not wisdom."
-- Clifford Stoll
 
 
 On Tue, Sep 22, 2015 at 1:21 PM, John Sorkin
  wrote:
 > I have data that may be the mixture of two normal distributions (one 
 > contained within the other) vs. a single normal.
 > I used normalmixEM to get estimates of parameters assuming two normals:
 >
 >
 > GLUT <- scale(na.omit(data[,"FCW_glut"]))
 > GLUT
 > mixmdl = normalmixEM(GLUT,k=1,arbmean=TRUE)
 > summary(mixmdl)
 > plot(mixmdl,which=2)
 > lines(density(data[,"GLUT"]), lty=2, lwd=2)
 >
 >
 >
 >
 >
 > summary of normalmixEM object:
 >comp 1   comp 2
 > lambda  0.7035179 0.296482
 > mu -0.0592302 0.140545
 > sigma   1.1271620 0.536076
 > loglik at estimate:  -110.8037
 >
 >
 >
 > I would like to see if the two normal distributions are a better fit that 
 > one normal. I have two problems
 > (1) normalmixEM does not seem to what to fit a single normal (even if I 
 > address the error message produced):
 >
 >
 >> mixmdl = normalmixEM(GLUT,k=1)
 > Error in normalmix.init(x = x, lambda = lambda, mu = mu, s = sigma, k = k,  :
 >   arbmean and arbvar cannot both be FALSE
 >> mixmdl = normalmixEM(GLUT,k=1,arbmean=TRUE)
 > Error in normalmix.init(x = x, lambda = lambda, mu = mu, s = sigma, k = k,  :
 >   arbmean and arbvar cannot both be FALSE
 >
 >
 >
 > (2) Even if I had the loglik from a single normal, I am not sure how many 
 > DFs to use when computing the -2LL ratio test.
 >
 >
 > Any suggestions for comparing the two-normal vs. one normal distribution 
 > would be appreciated.
 >
 >
 > Thanks
 > John
 >
 >
 >
 >
 >
 >
 >
 >
 >
 > John David Sorkin M.D., Ph.D.
 > Professor of Medicine
 > Chief, Biostatistics and Informatics
 > University of Maryland School of Medicine Division of Gerontology and 
 > Geriatric Medicine
 > Baltimore VA Medical Center
 > 10 North Greene Street
 > GRECC (BT/18/GR)
 > Baltimore, MD 21201-1524
 > (Phone) 410-605-7119410-605-7119
 > (Fax) 410-605-7913 (Please call phone number above prior to faxing)
 >
 >
 > Confidentiality Statement:
 

> This email message, including any attachments, is for ...{{dropped:12}}
 
 __
 R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.






Call
Send SMS
Call from mobile
Add to Skype
You'll need Skype CreditFree via Skype



Confidentiality Statement:
This email message, including any attachments, is for the sole use of the 
intended recipient(s) and may contain confidential and privileged information. 
Any unauthorized use, disclosure or distribution is prohibited. If you are not 
the intended recipient, please contact the sender by reply email and destroy 
all copies of the original message. 
__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Compare two normal to one normal

2015-09-22 Thread Bert Gunter
I'll be brief in my reply to you both, as this is off topic.

So what?  All this statistical stuff is irrelevant baloney(and of
questionable accuracy, since based on asymptotics and strong
assumptions, anyway) . The question of interest is whether a mixture
fit better suits the context, which only the OP knows and which none
of us can answer.

I know that many will disagree with this -- maybe a few might agree --
but please send all replies, insults, praise, and learned discourse to
me privately,  as I have already occupied more space on the list than
I should.

Cheers,
Bert


Bert Gunter

"Data is not information. Information is not knowledge. And knowledge
is certainly not wisdom."
   -- Clifford Stoll


On Tue, Sep 22, 2015 at 1:35 PM, Mark Leeds  wrote:
> That's true but if he uses some AIC or BIC criterion that penalizes the
> number of parameters,
> then he might see something else ? This ( comparing mixtures to not mixtures
> ) is not something I deal with so I'm just throwing it out there.
>
>
>
>
> On Tue, Sep 22, 2015 at 4:30 PM, Bert Gunter  wrote:
>>
>> Two normals will **always** be a better fit than one, as the latter
>> must be a subset of the former (with identical parameters for both
>> normals).
>>
>> Cheers,
>> Bert
>>
>>
>> Bert Gunter
>>
>> "Data is not information. Information is not knowledge. And knowledge
>> is certainly not wisdom."
>>-- Clifford Stoll
>>
>>
>> On Tue, Sep 22, 2015 at 1:21 PM, John Sorkin
>>  wrote:
>> > I have data that may be the mixture of two normal distributions (one
>> > contained within the other) vs. a single normal.
>> > I used normalmixEM to get estimates of parameters assuming two normals:
>> >
>> >
>> > GLUT <- scale(na.omit(data[,"FCW_glut"]))
>> > GLUT
>> > mixmdl = normalmixEM(GLUT,k=1,arbmean=TRUE)
>> > summary(mixmdl)
>> > plot(mixmdl,which=2)
>> > lines(density(data[,"GLUT"]), lty=2, lwd=2)
>> >
>> >
>> >
>> >
>> >
>> > summary of normalmixEM object:
>> >comp 1   comp 2
>> > lambda  0.7035179 0.296482
>> > mu -0.0592302 0.140545
>> > sigma   1.1271620 0.536076
>> > loglik at estimate:  -110.8037
>> >
>> >
>> >
>> > I would like to see if the two normal distributions are a better fit
>> > that one normal. I have two problems
>> > (1) normalmixEM does not seem to what to fit a single normal (even if I
>> > address the error message produced):
>> >
>> >
>> >> mixmdl = normalmixEM(GLUT,k=1)
>> > Error in normalmix.init(x = x, lambda = lambda, mu = mu, s = sigma, k =
>> > k,  :
>> >   arbmean and arbvar cannot both be FALSE
>> >> mixmdl = normalmixEM(GLUT,k=1,arbmean=TRUE)
>> > Error in normalmix.init(x = x, lambda = lambda, mu = mu, s = sigma, k =
>> > k,  :
>> >   arbmean and arbvar cannot both be FALSE
>> >
>> >
>> >
>> > (2) Even if I had the loglik from a single normal, I am not sure how
>> > many DFs to use when computing the -2LL ratio test.
>> >
>> >
>> > Any suggestions for comparing the two-normal vs. one normal distribution
>> > would be appreciated.
>> >
>> >
>> > Thanks
>> > John
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> > John David Sorkin M.D., Ph.D.
>> > Professor of Medicine
>> > Chief, Biostatistics and Informatics
>> > University of Maryland School of Medicine Division of Gerontology and
>> > Geriatric Medicine
>> > Baltimore VA Medical Center
>> > 10 North Greene Street
>> > GRECC (BT/18/GR)
>> > Baltimore, MD 21201-1524
>> > (Phone) 410-605-7119
>> > (Fax) 410-605-7913 (Please call phone number above prior to faxing)
>> >
>> >
>> > Confidentiality Statement:
>> > This email message, including any attachments, is for ...{{dropped:12}}
>>
>> __
>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
>

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Compare two normal to one normal

2015-09-22 Thread John Sorkin
Bert
I am surprised by your response. Statistics serves two purposes: estimation and 
hypothesis testing. Sometimes we are fortunate and theory, physiology, physics, 
or something else tell us what is the correct, or perhaps I should same most 
adequate model. Sometimes theory fails us and we wish to choose between two 
competing models. This is my case.  The cell sizes may come from one normal 
distribution (theory 1) or two (theory 2). Choosing between the models will 
help us postulate about physiology. I want to use statistics to help me decide 
between the two competing models, and thus inform my understanding of 
physiology. It is true that statistics can't tell me which model is the 
"correct" or "true" model, but it should be able to help me select the more 
"adequate" or "appropriate" or "closer to he truth" model.


In any event, I still don't know how to fit a single normal distribution and 
get a measure of fit e.g. log likelihood.


John


John David Sorkin M.D., Ph.D.
Professor of Medicine
Chief, Biostatistics and Informatics
University of Maryland School of Medicine Division of Gerontology and Geriatric 
Medicine
Baltimore VA Medical Center
10 North Greene Street
GRECC (BT/18/GR)
Baltimore, MD 21201-1524
(Phone) 410-605-7119
(Fax) 410-605-7913 (Please call phone number above prior to faxing) 

>>> Bert Gunter  09/22/15 4:48 PM >>>
I'll be brief in my reply to you both, as this is off topic.

So what?  All this statistical stuff is irrelevant baloney(and of
questionable accuracy, since based on asymptotics and strong
assumptions, anyway) . The question of interest is whether a mixture
fit better suits the context, which only the OP knows and which none
of us can answer.

I know that many will disagree with this -- maybe a few might agree --
but please send all replies, insults, praise, and learned discourse to
me privately,  as I have already occupied more space on the list than
I should.

Cheers,
Bert


Bert Gunter

"Data is not information. Information is not knowledge. And knowledge
is certainly not wisdom."
   -- Clifford Stoll


On Tue, Sep 22, 2015 at 1:35 PM, Mark Leeds  wrote:
> That's true but if he uses some AIC or BIC criterion that penalizes the
> number of parameters,
> then he might see something else ? This ( comparing mixtures to not mixtures
> ) is not something I deal with so I'm just throwing it out there.
>
>
>
>
> On Tue, Sep 22, 2015 at 4:30 PM, Bert Gunter  wrote:
>>
>> Two normals will **always** be a better fit than one, as the latter
>> must be a subset of the former (with identical parameters for both
>> normals).
>>
>> Cheers,
>> Bert
>>
>>
>> Bert Gunter
>>
>> "Data is not information. Information is not knowledge. And knowledge
>> is certainly not wisdom."
>>-- Clifford Stoll
>>
>>
>> On Tue, Sep 22, 2015 at 1:21 PM, John Sorkin
>>  wrote:
>> > I have data that may be the mixture of two normal distributions (one
>> > contained within the other) vs. a single normal.
>> > I used normalmixEM to get estimates of parameters assuming two normals:
>> >
>> >
>> > GLUT <- scale(na.omit(data[,"FCW_glut"]))
>> > GLUT
>> > mixmdl = normalmixEM(GLUT,k=1,arbmean=TRUE)
>> > summary(mixmdl)
>> > plot(mixmdl,which=2)
>> > lines(density(data[,"GLUT"]), lty=2, lwd=2)
>> >
>> >
>> >
>> >
>> >
>> > summary of normalmixEM object:
>> >comp 1   comp 2
>> > lambda  0.7035179 0.296482
>> > mu -0.0592302 0.140545
>> > sigma   1.1271620 0.536076
>> > loglik at estimate:  -110.8037
>> >
>> >
>> >
>> > I would like to see if the two normal distributions are a better fit
>> > that one normal. I have two problems
>> > (1) normalmixEM does not seem to what to fit a single normal (even if I
>> > address the error message produced):
>> >
>> >
>> >> mixmdl = normalmixEM(GLUT,k=1)
>> > Error in normalmix.init(x = x, lambda = lambda, mu = mu, s = sigma, k =
>> > k,  :
>> >   arbmean and arbvar cannot both be FALSE
>> >> mixmdl = normalmixEM(GLUT,k=1,arbmean=TRUE)
>> > Error in normalmix.init(x = x, lambda = lambda, mu = mu, s = sigma, k =
>> > k,  :
>> >   arbmean and arbvar cannot both be FALSE
>> >
>> >
>> >
>> > (2) Even if I had the loglik from a single normal, I am not sure how
>> > many DFs to use when computing the -2LL ratio test.
>> >
>> >
>> > Any suggestions for comparing the two-normal vs. one normal distribution
>> > would be appreciated.
>> >
>> >
>> > Thanks
>> > John
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> > John David Sorkin M.D., Ph.D.
>> > Professor of Medicine
>> > Chief, Biostatistics and Informatics
>> > University of Maryland School of Medicine Division of Gerontology and
>> > Geriatric Medicine
>> > Baltimore VA Medical Center
>> > 10 North Greene Street
>> > GRECC (BT/18/GR)
>> > Baltimore, MD 21201-1524
>> > (Phone) 410-605-7119410-605-7119
>> > (Fax) 410-605-7913 (Please call phone number above prior to faxing)
>> >
>> >
>> > Confidentiality 

Re: [R] Compare two normal to one normal

2015-09-22 Thread Mark Leeds
Hi John:  For the log likelihood in the single case, you can just calculate
it directly
using the normal density, so the sum from i = 1 to n of f(x_i, uhat,
sigmahat)
where f(x_i, uhat, sigma hat)  is the density of the normal with that mean
and variance.
so you can use dnorm with log = TRUE.  Of course you need to estimate the
parameters uhat and sigma hat first but for the single normal case, they
are of course just the sample mean and sample variance

Note though: If you going to calculate a log likelihood ratio, make sure
you compare
apples and apples and not apples and oranges in the sense that the
loglikelihood
that comes out of the mixture case may include constants such
1/radical(2pi) etc.
So you need to know EXACTLY how the mixture algorithm is calculating it's
log likelihood.

In fact, it may be better and safer to just calculate the loglikelihood for
the mixture yourself also so sum  from i = 1 to n of [ lambda*f(x_i,
mu1hat, sigma1hat) + (1-lambda)*f(x_i, mu2hat, sigma2hat) By calculating it
yourself and being consistent, you then know that you will be calculating
apples and applies.

As I said earlier, another way is by comparing AICs. in that case, you
calculate it
in both cases and see which AIC is lower. Lower wins and it penalizes for
number of parameters. There are asymptotics required in both the LRT
approach and the AIC
approach so you can pick your poison !!! :).



























On Tue, Sep 22, 2015 at 6:01 PM, John Sorkin 
wrote:

> Bert
> I am surprised by your response. Statistics serves two purposes:
> estimation and hypothesis testing. Sometimes we are fortunate and theory,
> physiology, physics, or something else tell us what is the correct, or
> perhaps I should same most adequate model. Sometimes theory fails us and we
> wish to choose between two competing models. This is my case.  The cell
> sizes may come from one normal distribution (theory 1) or two (theory 2).
> Choosing between the models will help us postulate about physiology. I want
> to use statistics to help me decide between the two competing models, and
> thus inform my understanding of physiology. It is true that statistics
> can't tell me which model is the "correct" or "true" model, but it should
> be able to help me select the more "adequate" or "appropriate" or "closer
> to he truth" model.
>
> In any event, I still don't know how to fit a single normal distribution
> and get a measure of fit e.g. log likelihood.
>
> John
>
>
> John David Sorkin M.D., Ph.D.
> Professor of Medicine
> Chief, Biostatistics and Informatics
> University of Maryland School of Medicine Division of Gerontology and
> Geriatric Medicine
> Baltimore VA Medical Center
> 10 North Greene Street
> GRECC (BT/18/GR)
> Baltimore, MD 21201-1524
> (Phone) 410-605-7119
> (Fax) 410-605-7913 (Please call phone number above prior to faxing)
>
> >>> Bert Gunter  09/22/15 4:48 PM >>>
> I'll be brief in my reply to you both, as this is off topic.
>
> So what? All this statistical stuff is irrelevant baloney(and of
> questionable accuracy, since based on asymptotics and strong
> assumptions, anyway) . The question of interest is whether a mixture
> fit better suits the context, which only the OP knows and which none
> of us can answer.
>
> I know that many will disagree with this -- maybe a few might agree --
> but please send all replies, insults, praise, and learned discourse to
> me privately, as I have already occupied more space on the list than
> I should.
>
> Cheers,
> Bert
>
>
> Bert Gunter
>
> "Data is not information. Information is not knowledge. And knowledge
> is certainly not wisdom."
> -- Clifford Stoll
>
>
> On Tue, Sep 22, 2015 at 1:35 PM, Mark Leeds  wrote:
> > That's true but if he uses some AIC or BIC criterion that penalizes the
> > number of parameters,
> > then he might see something else ? This ( comparing mixtures to not
> mixtures
> > ) is not something I deal with so I'm just throwing it out there.
> >
> >
> >
> >
> > On Tue, Sep 22, 2015 at 4:30 PM, Bert Gunter 
> wrote:
> >>
> >> Two normals will **always** be a better fit than one, as the latter
> >> must be a subset of the former (with identical parameters for both
> >> normals).
> >>
> >> Cheers,
> >> Bert
> >>
> >>
> >> Bert Gunter
> >>
> >> "Data is not information. Information is not knowledge. And knowledge
> >> is certainly not wisdom."
> >> -- Clifford Stoll
> >>
> >>
> >> On Tue, Sep 22, 2015 at 1:21 PM, John Sorkin
> >>  wrote:
> >> > I have data that may be the mixture of two normal distributions (one
> >> > contained within the other) vs. a single normal.
> >> > I used normalmixEM to get estimates of parameters assuming two
> normals:
> >> >
> >> >
> >> > GLUT <- scale(na.omit(data[,"FCW_glut"]))
> >> > GLUT
> >> > mixmdl = normalmixEM(GLUT,k=1,arbmean=TRUE)
> >> > summary(mixmdl)
> >> > plot(mixmdl,which=2)
> >> > 

Re: [R] Compare two normal to one normal

2015-09-22 Thread Mark Leeds
John: After I sent what I wrote, I read Rolf's intelligent response. I
didn't realize that
there are boundary issues so yes, he's correct and  my approach is EL
WRONGO. I feel very not good that I just sent that email being that it's
totally wrong. My apologies for noise
and thanks Rolf for the correct response.

Oh,  thing that does still hold in my response is  the AIC approach unless
Rolf
tells us that it's not valid also. I don't see why it wouldn't be though
because you're
not doing a hypothesis test when you go the AIC route.






On Wed, Sep 23, 2015 at 12:33 AM, Mark Leeds  wrote:

> Hi John:  For the log likelihood in the single case, you can just
> calculate it directly
> using the normal density, so the sum from i = 1 to n of f(x_i, uhat,
> sigmahat)
> where f(x_i, uhat, sigma hat)  is the density of the normal with that mean
> and variance.
> so you can use dnorm with log = TRUE.  Of course you need to estimate the
> parameters uhat and sigma hat first but for the single normal case, they
> are of course just the sample mean and sample variance
>
> Note though: If you going to calculate a log likelihood ratio, make sure
> you compare
> apples and apples and not apples and oranges in the sense that the
> loglikelihood
> that comes out of the mixture case may include constants such
> 1/radical(2pi) etc.
> So you need to know EXACTLY how the mixture algorithm is calculating it's
> log likelihood.
>
> In fact, it may be better and safer to just calculate the loglikelihood
> for the mixture yourself also so sum  from i = 1 to n of [ lambda*f(x_i,
> mu1hat, sigma1hat) + (1-lambda)*f(x_i, mu2hat, sigma2hat) By calculating it
> yourself and being consistent, you then know that you will be calculating
> apples and applies.
>
> As I said earlier, another way is by comparing AICs. in that case, you
> calculate it
> in both cases and see which AIC is lower. Lower wins and it penalizes for
> number of parameters. There are asymptotics required in both the LRT
> approach and the AIC
> approach so you can pick your poison !!! :).
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> On Tue, Sep 22, 2015 at 6:01 PM, John Sorkin 
> wrote:
>
>> Bert
>> I am surprised by your response. Statistics serves two purposes:
>> estimation and hypothesis testing. Sometimes we are fortunate and theory,
>> physiology, physics, or something else tell us what is the correct, or
>> perhaps I should same most adequate model. Sometimes theory fails us and we
>> wish to choose between two competing models. This is my case.  The cell
>> sizes may come from one normal distribution (theory 1) or two (theory 2).
>> Choosing between the models will help us postulate about physiology. I want
>> to use statistics to help me decide between the two competing models, and
>> thus inform my understanding of physiology. It is true that statistics
>> can't tell me which model is the "correct" or "true" model, but it should
>> be able to help me select the more "adequate" or "appropriate" or "closer
>> to he truth" model.
>>
>> In any event, I still don't know how to fit a single normal distribution
>> and get a measure of fit e.g. log likelihood.
>>
>> John
>>
>>
>> John David Sorkin M.D., Ph.D.
>> Professor of Medicine
>> Chief, Biostatistics and Informatics
>> University of Maryland School of Medicine Division of Gerontology and
>> Geriatric Medicine
>> Baltimore VA Medical Center
>> 10 North Greene Street
>> GRECC (BT/18/GR)
>> Baltimore, MD 21201-1524
>> (Phone) 410-605-7119
>> (Fax) 410-605-7913 (Please call phone number above prior to faxing)
>>
>> >>> Bert Gunter  09/22/15 4:48 PM >>>
>> I'll be brief in my reply to you both, as this is off topic.
>>
>> So what? All this statistical stuff is irrelevant baloney(and of
>> questionable accuracy, since based on asymptotics and strong
>> assumptions, anyway) . The question of interest is whether a mixture
>> fit better suits the context, which only the OP knows and which none
>> of us can answer.
>>
>> I know that many will disagree with this -- maybe a few might agree --
>> but please send all replies, insults, praise, and learned discourse to
>> me privately, as I have already occupied more space on the list than
>> I should.
>>
>> Cheers,
>> Bert
>>
>>
>> Bert Gunter
>>
>> "Data is not information. Information is not knowledge. And knowledge
>> is certainly not wisdom."
>> -- Clifford Stoll
>>
>>
>> On Tue, Sep 22, 2015 at 1:35 PM, Mark Leeds  wrote:
>> > That's true but if he uses some AIC or BIC criterion that penalizes the
>> > number of parameters,
>> > then he might see something else ? This ( comparing mixtures to not
>> mixtures
>> > ) is not something I deal with so I'm just throwing it out there.
>> >
>> >
>> >
>> >
>> > On Tue, Sep 22, 2015 at 4:30 PM, Bert Gunter 
>> wrote:
>> >>
>> >> Two normals will **always** be a better fit than one, as the latter
>> 

Re: [R] Compare two normal to one normal

2015-09-22 Thread Charles C. Berry

On Tue, 22 Sep 2015, John Sorkin wrote:



In any event, I still don't know how to fit a single normal distribution 
and get a measure of fit e.g. log likelihood.




Gotta love R:


y <- rnorm(10)
logLik(glm(y~1))

'log Lik.' -17.36071 (df=2)

HTH,

Chuck

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.