I posted this to sci.stat.consult, only, before noting the same 
question had been separately posted   to sci.stat.edu 
and sci.stat.math, too.   - this is posted to the latter two.

On 16 Jan 2004 16:31:28 -0800, [EMAIL PROTECTED] (Koh Puay Ping)
wrote:

> Hi all, I have a question on logistic regression.
> 
> When we are finding a multivariate model (using proc logistic, SAS), I
> understand that we should perform univariate analysis first to
> identify the variables with (Pr < 0.25) for multivariate modelling.

Well.  No.  Univariate variable-screening for model-building 
has fundamental problems, unless the project is totally 
exploratory -- or you have a large surplus of cases.
"Stepwise"  selection of variables is not a respected technique,
for most purposes.  Pre-selection, using the univariate tests,
eliminates some aspects of confounding, which may be good
or bad, but it also means that you can't use the nominal 
statistical tests later on.

You can see my stats-FAQ  for comments and references.

[ snip, description of stepwise entry; including incomplete
description of incorporation of interactions.]

> 
> I have done this but found that the fit of the model is still not very
> good. But when i remove one of the independent variables that is found
> to be highly significant (Pr< 0.0001, and largest difference of -2log
> likelihood), the fit of the model improved greatly. So my questions
> are:

That is another mis-statement.  You say that when you take
out the variable that has the largest partial contribution to 
the "fit", that the "fit"  is improved.  That is a contradiction of
statistical terms, since the *legitimate*  indicator of fit,
precisely, is that -2log  term.  So, I assume that you are 
picking up some other criterion of fit that is not so essential, 
such as "group assignment".   Assigning the correct group
is a extra statistical report that regression programs 
usually provide, these days;  but it is not essential to 
the statistical part of the procedure.

So, Yes, that can happen.  If you were looking at a properly
established regression equation, and happened to see this
when you dropped on variable (for some reason), 
I would think that it is very likely to indicate that you 
have some outliers (say) or other distributional artifact,
affecting the results.   But you don't have a decent equation,
from what you have said.

Funny peripheral things like this are also common once
you have over-fitted a data set.  Since you got to this point 
by "stepwise," overfitting is another good chance.

As I said, check my stats-FAQ; check those references; 
or  you may want to use keywords in groups.google.com
to check the sci.stat.*  groups.


-- 
Rich Ulrich, [EMAIL PROTECTED]
http://www.pitt.edu/~wpilib/index.html
"Taxes are the price we pay for civilization." 
.
.
=================================================================
Instructions for joining and leaving this list, remarks about the
problem of INAPPROPRIATE MESSAGES, and archives are available at:
.                  http://jse.stat.ncsu.edu/                    .
=================================================================

Reply via email to