Hi all, i have some questions on logistic regression. When we are finding a multivariate model (using proc logistic, SAS), I understand that we should perform univariate analysis first to identify the variables with (Pr < 0.25) for multivariate modelling. The first variable to include shall be the one with the greatest difference of -2Log likelihood (intercept - intercept and covaritates). The second variable to include shall be the greatest difference of this value but now with the first variable in the model. Next the variable that is already in the model is checked on its validity with the new variable added. This carries on till the difference of -2log likelihood and the last model have no sign. difference. This would be mean that no variable could be added or removed from the model. Then possible interaction terms are tested by adding one by one to see its significance.
I have done this but found that the fit of the model is still not very good. But when i remove one of the independent variables that is found to be highly significant (Pr< 0.0001, and largest difference of -2log likelihood), the fit of the model improved greatly. So my questions are: 1) What contribute to the above scenario? 2) If the fit of the model obtained by the method above is found to be not good, what can we conclude about the independent variables that is used to form the model? 3) If the fit of the model is good but some of the variables in the models are not signifcant or there are still unknown/unidentified variables that are significant, is this model better the above one? And what can we then conclude from this model? Thanks for all valuable comments. . . ================================================================= Instructions for joining and leaving this list, remarks about the problem of INAPPROPRIATE MESSAGES, and archives are available at: . http://jse.stat.ncsu.edu/ . =================================================================
