On Wed, 28 May 2003, Scheltema, Karen wrote: > I know about the perils of stepwise, and I agree with you that it is > a less than desirable procedure. This researcher, however, is not > as convinced as I am about not doing stepwise. Sigh. He has more > variables than would comfortably fit a 5-1 case to variable ratio > for a forced entry regression, which is why he was hoping stepwise > would help him narrow his model. Any suggestions I can give him, > short of telling him to scrap everything?
Hi, Karen. What if any <intelligence> was he applying to the problem of selecting variables? For example, when a variable is dropped, it will have been dropped because its partial correlation with the dependent residual at that point is smaller than the largest competing partial correlation. This can sound like a reasonable basis for discarding the variable. However, if the two partial correlations in question are, say, 0.5345 and 0.5346, one might prefer to decide between THOSE two variables on a basis other than size of partial r. For reasons like this, if one is going to do stepwise at all, one should have a close look at each step and the decision made therein; and should probably run several different stepwise regression, with different starting points (e.g., one with all variables in [backwards] as described; one with no variables in [forwards] as is more usual; several with different subsets of the candidate predictors in the equation intially). This will give at least some idea of how heavily the results in the first instance were, as they say, "capitalizing on chance", and how stable those results may be thought to be. And it may be clearer where one would REALLY like to apply some intelligence (as distinct from computational crank-turning) to the process. As Mike Babyak pointed out, orthogonalizing at least the interaction terms with respect to their lower-order components would be sensible. (And if you want to see some REALLY large VIFs, look at my Minitab white paper. Initially Minitab would not accept all fifteen predictors and insisted on throwing one of them out, the VIFs were so high (or, equivalently, the tolerances were so low). I had to reset the tolerance threshold to something quite ridiculously low just to get all of them in, when they (and their products) were in their original form.) In addition, if the investigator has a way of ranking, or rating, the candidate predictors in importance (theoretical or other), it may be informative to take them in order (most important first) and orthogonalize each of them with respect to all the preceding ones. (Whether to take them one at a time or (perhaps occasionally) several in a bunch I cannot advise you, not knowing the theoretical and substantive context.) Hope this has been helpful. Good luck! -- Don. ----------------------------------------------------------------------- Donald F. Burrill [EMAIL PROTECTED] 56 Sebbins Pond Drive, Bedford, NH 03110 (603) 626-0816 . . ================================================================= Instructions for joining and leaving this list, remarks about the problem of INAPPROPRIATE MESSAGES, and archives are available at: . http://jse.stat.ncsu.edu/ . =================================================================
