Hi all, i have some questions on logistic regression.

When we are finding a multivariate model (using proc logistic, SAS), I
understand that we should perform univariate analysis first to
identify the variables with (Pr < 0.25) for multivariate modelling.
The first variable to include shall be the one with the greatest
difference of -2Log likelihood (intercept - intercept and
covaritates).  The second variable to include shall be the greatest 
difference of this value but now with the first variable in the model.
Next the variable that is already in the model is checked on its
validity with the new variable added.  This carries on till the
difference of -2log likelihood and the last model have no sign.
difference.  This would be mean that no variable could be added or
removed from the model.  Then possible interaction terms are tested by
adding one by one to see its significance. 

I have done this but found that the fit of the model is still not very
good. But when i remove one of the independent variables that is found
to be highly significant (Pr< 0.0001, and largest difference of -2log
likelihood), the fit of the model improved greatly. So my questions
are:

1) What contribute to the above scenario?

2) If the fit of the model obtained by the method above is found to be
not good, what can we conclude about the independent variables that is
used to form the model?

3) If the fit of the model is good but some of the variables in the
models are not signifcant or there are still unknown/unidentified
variables that are significant, is this model better the above one?
And what can we then conclude from this model?

Thanks for all valuable comments.
.
.
=================================================================
Instructions for joining and leaving this list, remarks about the
problem of INAPPROPRIATE MESSAGES, and archives are available at:
.                  http://jse.stat.ncsu.edu/                    .
=================================================================

Reply via email to