HI,

Can you recommend any good method to choose the best model for a 12
variable logistic model?  Should i always try to get a model with very
good fit (H & L goodness of test using SAS)?



Rich Ulrich <[EMAIL PROTECTED]> wrote in message news:<[EMAIL PROTECTED]>...
> I posted this to sci.stat.consult, only, before noting the same 
> question had been separately posted   to sci.stat.edu 
> and sci.stat.math, too.   - this is posted to the latter two.
> 
> On 16 Jan 2004 16:31:28 -0800, [EMAIL PROTECTED] (Koh Puay Ping)
> wrote:
> 
> > Hi all, I have a question on logistic regression.
> > 
> > When we are finding a multivariate model (using proc logistic, SAS), I
> > understand that we should perform univariate analysis first to
> > identify the variables with (Pr < 0.25) for multivariate modelling.
> 
> Well.  No.  Univariate variable-screening for model-building 
> has fundamental problems, unless the project is totally 
> exploratory -- or you have a large surplus of cases.
> "Stepwise"  selection of variables is not a respected technique,
> for most purposes.  Pre-selection, using the univariate tests,
> eliminates some aspects of confounding, which may be good
> or bad, but it also means that you can't use the nominal 
> statistical tests later on.
> 
> You can see my stats-FAQ  for comments and references.
> 
> [ snip, description of stepwise entry; including incomplete
> description of incorporation of interactions.]
> 
> > 
> > I have done this but found that the fit of the model is still not very
> > good. But when i remove one of the independent variables that is found
> > to be highly significant (Pr< 0.0001, and largest difference of -2log
> > likelihood), the fit of the model improved greatly. So my questions
> > are:
> 
> That is another mis-statement.  You say that when you take
> out the variable that has the largest partial contribution to 
> the "fit", that the "fit"  is improved.  That is a contradiction of
> statistical terms, since the *legitimate*  indicator of fit,
> precisely, is that -2log  term.  So, I assume that you are 
> picking up some other criterion of fit that is not so essential, 
> such as "group assignment".   Assigning the correct group
> is a extra statistical report that regression programs 
> usually provide, these days;  but it is not essential to 
> the statistical part of the procedure.
> 
> So, Yes, that can happen.  If you were looking at a properly
> established regression equation, and happened to see this
> when you dropped on variable (for some reason), 
> I would think that it is very likely to indicate that you 
> have some outliers (say) or other distributional artifact,
> affecting the results.   But you don't have a decent equation,
> from what you have said.
> 
> Funny peripheral things like this are also common once
> you have over-fitted a data set.  Since you got to this point 
> by "stepwise," overfitting is another good chance.
> 
> As I said, check my stats-FAQ; check those references; 
> or  you may want to use keywords in groups.google.com
> to check the sci.stat.*  groups.
.
.
=================================================================
Instructions for joining and leaving this list, remarks about the
problem of INAPPROPRIATE MESSAGES, and archives are available at:
.                  http://jse.stat.ncsu.edu/                    .
=================================================================

Reply via email to