In article <[EMAIL PROTECTED]>, Koh Puay Ping <[EMAIL PROTECTED]> wrote: >Hi all, i have some questions on logistic regression.
>When we are finding a multivariate model (using proc logistic, SAS), I >understand that we should perform univariate analysis first to >identify the variables with (Pr < 0.25) for multivariate modelling. Where did you get this? A variable can be very important for the multivariate model and not look at all good by itself. Also, a variable can be the one giving the best fit by itself and be irrelevant for the multivariate model. An example of this for linear models is the actual prediction of the best model plus a small error. >The first variable to include shall be the one with the greatest >difference of -2Log likelihood (intercept - intercept and >covaritates). The second variable to include shall be the greatest >difference of this value but now with the first variable in the model. >Next the variable that is already in the model is checked on its >validity with the new variable added. This carries on till the >difference of -2log likelihood and the last model have no sign. >difference. This would be mean that no variable could be added or >removed from the model. Then possible interaction terms are tested by >adding one by one to see its significance. Stepwise regression is far from optimal. >I have done this but found that the fit of the model is still not very >good. But when i remove one of the independent variables that is found >to be highly significant (Pr< 0.0001, and largest difference of -2log >likelihood), the fit of the model improved greatly. So my questions >are: Was this before or after the other variables are included? In any case, I do not see how removing a variable, assuming that there are a fair number of degrees of freedom and not too many parameters, can improve the fit. It might pay to look at the likelihood function, not just the MLE. Sometimes the shape for non-linear models is rather odd. And above all, statistical significance is a very poor criterion. >1) What contribute to the above scenario? >2) If the fit of the model obtained by the method above is found to be >not good, what can we conclude about the independent variables that is >used to form the model? >3) If the fit of the model is good but some of the variables in the >models are not signifcant or there are still unknown/unidentified >variables that are significant, is this model better the above one? >And what can we then conclude from this model? >Thanks for all valuable comments. -- This address is for information only. I do not claim that these views are those of the Statistics Department or of Purdue University. Herman Rubin, Department of Statistics, Purdue University [EMAIL PROTECTED] Phone: (765)494-6054 FAX: (765)494-0558 . . ================================================================= Instructions for joining and leaving this list, remarks about the problem of INAPPROPRIATE MESSAGES, and archives are available at: . http://jse.stat.ncsu.edu/ . =================================================================
