Title: RE: VIF for dichotmous variable
Thanks to everyone for their helpful and informative advice.  He used the partial orthogonalization, and the VIFs behaved nicely.
 

Karen Scheltema, M.A, M.S.
Senior Statistician
HealthEast
Research and Education Department, Midway Campus
1700 University Ave W
St. Paul, MN 55104
Ph: (651) 232-5212   fax: (651) 641-0683
mailto:[EMAIL PROTECTED]

-----Original Message-----
From: Simon, Steve, PhD [mailto:[EMAIL PROTECTED]
Sent: Wednesday, May 28, 2003 5:18 PM
To: Scheltema, Karen; Ed Stat (E-mail)
Subject: RE: VIF for dichotmous variable

Karen Scheltema writes:

> I know about the perils of stepwise, and I agree with you that
> it is a less than desirable procedure. This researcher,
> however, is not as convinced as I am about not doing stepwise. Sigh.
> He has more variables than would comfortably fit a 5-1 case to
> variable ratio for a forced entry regression, which is why he was
> hoping stepwise would help him narrow his model. Any suggestions I
> can give him, short of telling him to scrap everything?

The five to one ratio (actually, I've heard ten to one or fifteen to one) refers to the number of candidate variables and not the number of variables in the final stepwise model. If you read the original papers that developed these ratios, they were developed to see how well stepwise regression would do. And stepwise regression is highly unstable and often selects the wrong variables when the number of variables going into stepwise (not the number coming out) is large relative to the number of observations.

So stepwise does not solve anything. The only real solution is to eliminate certain variables a priori using medical and scientific criteria.

Along with the relative lack of data, you also have high VIFs. Both of these indicate that the model is unstable and will be unlikely to replicate well with a different data set. A high VIF is actually another indication of a relative lack of data. His independent variables do not effectively fill up the k-dimenional hyperspace but instead fall close to a lower dimensional hyperspace. In simple terms, some of the corners in his data space are empty.

That's actually good news in a way. It means that his data set is good for generating hypotheses but not for confirming hypotheses. Try to get him to focus on exploratory models--draw lots of graphs and use words like "suggestive of a trend". Don't pretend that the confidence intervals and p-values are proving a whole lot. Try to include as few of these as possible in the final publication.

Don't focus on a single model. A series of single variable regression models may be more informative than a single multiple variable regression model.

Don't worry whether you have the "right" model or not. Your model is almost certainly wrong. That's liberating. If all approaches are likely to yield the wrong results, then you can't be faulted for using (or not using) any particular approach. Just use any reasonable approach and if you put in a lot of caveats ("further study with a larger data set should be done") then you should be okay.

It's a rare data set that should be totally scrapped. It may only provide weak evidence, but it still helps point future researchers in the right direction. The only sin here is to pretend that this data is definitive or the final word.

Good luck!

Steve Simon, [EMAIL PROTECTED], Standard Disclaimer.
The STATS web page has moved to
http://www.childrens-mercy.org/stats.

P.S. There are a bunch of new methods that can handle model selection better than stepwise, but these might be overkill for your application. I'm just starting to look at these approaches (the lasso, bagging, and boosting) so I can't say anything other than they have cute names.


The information included in this e-mail message, including any attachments, is intended only for the person or organization to which it is addressed. This e-mail message may contain information that is privileged or confidential. If you receive this e-mail message and are not the intended recipient or responsible for delivering the message to the intended recipient, you may not use, disseminate, distribute or copy the information included in this e-mail and any attachments. If you received this e-mail message by mistake, please reply by e-mail and destroy all copies of this message and any attachments. Thank you.

Reply via email to