HI, Can you recommend any good method to choose the best model for a 12 variable logistic model? Should i always try to get a model with very good fit (H & L goodness of test using SAS)?
Rich Ulrich <[EMAIL PROTECTED]> wrote in message news:<[EMAIL PROTECTED]>... > I posted this to sci.stat.consult, only, before noting the same > question had been separately posted to sci.stat.edu > and sci.stat.math, too. - this is posted to the latter two. > > On 16 Jan 2004 16:31:28 -0800, [EMAIL PROTECTED] (Koh Puay Ping) > wrote: > > > Hi all, I have a question on logistic regression. > > > > When we are finding a multivariate model (using proc logistic, SAS), I > > understand that we should perform univariate analysis first to > > identify the variables with (Pr < 0.25) for multivariate modelling. > > Well. No. Univariate variable-screening for model-building > has fundamental problems, unless the project is totally > exploratory -- or you have a large surplus of cases. > "Stepwise" selection of variables is not a respected technique, > for most purposes. Pre-selection, using the univariate tests, > eliminates some aspects of confounding, which may be good > or bad, but it also means that you can't use the nominal > statistical tests later on. > > You can see my stats-FAQ for comments and references. > > [ snip, description of stepwise entry; including incomplete > description of incorporation of interactions.] > > > > > I have done this but found that the fit of the model is still not very > > good. But when i remove one of the independent variables that is found > > to be highly significant (Pr< 0.0001, and largest difference of -2log > > likelihood), the fit of the model improved greatly. So my questions > > are: > > That is another mis-statement. You say that when you take > out the variable that has the largest partial contribution to > the "fit", that the "fit" is improved. That is a contradiction of > statistical terms, since the *legitimate* indicator of fit, > precisely, is that -2log term. So, I assume that you are > picking up some other criterion of fit that is not so essential, > such as "group assignment". Assigning the correct group > is a extra statistical report that regression programs > usually provide, these days; but it is not essential to > the statistical part of the procedure. > > So, Yes, that can happen. If you were looking at a properly > established regression equation, and happened to see this > when you dropped on variable (for some reason), > I would think that it is very likely to indicate that you > have some outliers (say) or other distributional artifact, > affecting the results. But you don't have a decent equation, > from what you have said. > > Funny peripheral things like this are also common once > you have over-fitted a data set. Since you got to this point > by "stepwise," overfitting is another good chance. > > As I said, check my stats-FAQ; check those references; > or you may want to use keywords in groups.google.com > to check the sci.stat.* groups. . . ================================================================= Instructions for joining and leaving this list, remarks about the problem of INAPPROPRIATE MESSAGES, and archives are available at: . http://jse.stat.ncsu.edu/ . =================================================================
