On Wed, 12 Feb 2003 07:35:59 +0100, "Rafal" <[EMAIL PROTECTED]> wrote:
> Hello > > I have a network of sample plots distirubuted in a regular grid in > researched area. On each plot I measured my dependent variable and set of > 'independent' (as independent as real world relations can be where > everything sems to be related to everything). Well, you are crossing the two meanings of 'independent'. I try to ignore the one where it is the contrast to 'dependent', which means 'predicted'. But *you* need to pay special attention to the other meaning, where 'independent' says that the variables are not part of an autocorrelated series, such as Time (in one dimension) or Geography (in 2D). It sounds as if you will not have good tests, owing to this sort of dependency. > Now I am using multivariate > regression to create a model allowing to estimate dependent variable from > set of independent. Dependent variable has normal distribution. The most > correlated with dependent variable seems to be height above sea level > although its distribution is very different from normal because of shape of > area where research was made. My questions are: > 1)Should I try to transform it to obtain distribution more similar to normal To (1): Probably, No. Thom posted one reason, that it is the residual that matters, not the raw distribution. Is there a rational basis for transformation, relative to what you are studying? - for instance, if "altitude" matters because of absolute air pressure, I imagine that "air pressure" might be related as a log-function instead of the raw measurements. A transformation that is totally arbitrary -- I mostly feel that way about what is called the "started logarithm", log(x+c) -- is going to be hard to explain, unless you can tuck it into a black-box of "fuzzy logic" or "neural nets". > as literature suggests? - And, you want to be careful, in that reading of the literature. Arbitrary transformations, into rank-orders or otherwise, are suggested for the purpose of providing a powerful uni-variate test. If you are moving to modeling, you usually want only the rational transformations (as I just described), so the model with maintain sense. > 2) When forward or backward stepwise regression is used, is there a limit > for the number of independent variables one can take into procedure? Does > the literature which suggest taking 5 to 20 times less variables then cases > speaks about final equation obtained from frinished regression procedure or > amount before stepwise procedure? Check my stats-faq for comments posted a few years ago, by various people, on why stepwise regression is a bad idea for most applications. You could also search sci.stat.* with groups.google.com for more recent comments. -- Rich Ulrich, [EMAIL PROTECTED] http://www.pitt.edu/~wpilib/index.html . . ================================================================= Instructions for joining and leaving this list, remarks about the problem of INAPPROPRIATE MESSAGES, and archives are available at: . http://jse.stat.ncsu.edu/ . =================================================================
