Re: fit of logistic regression model

Herman Rubin Sun, 18 Jan 2004 11:37:20 -0800

In article <[EMAIL PROTECTED]>,
Koh Puay Ping <[EMAIL PROTECTED]> wrote:
>Hi all, i have some questions on logistic regression.


>When we are finding a multivariate model (using proc logistic, SAS), I
>understand that we should perform univariate analysis first to
>identify the variables with (Pr < 0.25) for multivariate modelling.

Where did you get this?  A variable can be very important
for the multivariate model and not look at all good by
itself.  Also, a variable can be the one giving the best
fit by itself and be irrelevant for the multivariate
model.  An example of this for linear models is the actual
prediction of the best model plus a small error.

>The first variable to include shall be the one with the greatest
>difference of -2Log likelihood (intercept - intercept and
>covaritates).  The second variable to include shall be the greatest 
>difference of this value but now with the first variable in the model.
>Next the variable that is already in the model is checked on its
>validity with the new variable added.  This carries on till the
>difference of -2log likelihood and the last model have no sign.
>difference.  This would be mean that no variable could be added or
>removed from the model.  Then possible interaction terms are tested by
>adding one by one to see its significance. 

Stepwise regression is far from optimal.

>I have done this but found that the fit of the model is still not very
>good. But when i remove one of the independent variables that is found
>to be highly significant (Pr< 0.0001, and largest difference of -2log
>likelihood), the fit of the model improved greatly. So my questions
>are:

Was this before or after the other variables are included?

In any case, I do not see how removing a variable, assuming
that there are a fair number of degrees of freedom and not
too many parameters, can improve the fit.  It might pay to
look at the likelihood function, not just the MLE.
Sometimes the shape for non-linear models is rather odd.

And above all, statistical significance is a very poor
criterion.  

>1) What contribute to the above scenario?

>2) If the fit of the model obtained by the method above is found to be
>not good, what can we conclude about the independent variables that is
>used to form the model?

>3) If the fit of the model is good but some of the variables in the
>models are not signifcant or there are still unknown/unidentified
>variables that are significant, is this model better the above one?
>And what can we then conclude from this model?

>Thanks for all valuable comments.


-- 
This address is for information only.  I do not claim that these views
are those of the Statistics Department or of Purdue University.
Herman Rubin, Department of Statistics, Purdue University
[EMAIL PROTECTED]         Phone: (765)494-6054   FAX: (765)494-0558
.
.
=================================================================
Instructions for joining and leaving this list, remarks about the
problem of INAPPROPRIATE MESSAGES, and archives are available at:
.                  http://jse.stat.ncsu.edu/                    .
=================================================================

Re: fit of logistic regression model

Reply via email to