Sorry for the confusion
*****************************************
This morning's message
Yesterday I received a query asking about the sampling distribution of beta, the standardized regression coefficient. I thought that I knew the answer, but I thought that I should check. Now I am confused and in need of help.
Let's simplify the problem to simply look at the sampling distribution of b for the moment. (One is just a linear transformation of the other.)
Hogg and Craig(1978, p. 297) state that since a and b are linear functions of Y, they are each normally distributed. They base this on a variance sum law argument. Tamhane and Dunlop (2000, p. 358) show a proof using the same argument.
We also know, from a million sources, that the sampling distribution of (b-b*) divided by se(b) follows a t distribution, where b* represents the parameter. Now the numerator of a t distribution is a normally distributed variable, so (b - b*) must be normally distributed. Since b* is a constant, then b must be normally distributed.
That would seem to settle the question--b is normally distributed.
BUT, suppose that we standardized our variables. This is a linear transformation for both X and Y, so would not change the correlation nor the shape of distributions. With standardized variables the slope is equal to the standardized slope (beta). And it is easy to show that beta, with only one predictor, is equal to the correlation coefficient. So when we take standardized variables, which are just linear transformations of unstandardized variables, r, b, and beta will all be numerically equal.
Now, we all know that the sampling distribution for r is skewed when r* (the parameter) is not 0. If we let r* = .60, the skew is quite noticeable. But if r and beta are equal, then the sampling distribution of beta will be equal to the sampling distribution of r, and therefore it will also be skewed. And if the sampling distribution of beta is skewed, so should be the sampling distribution of b, because it is simply a linear transformation on beta.
So now I have shown that the sampling distribution is skewed, though I began by quoting experts I respect saying that it is normal.
Finally, I used Resampling Stats to empirically generate the sampling distribution. It is very definitely skewed.
So where did I go wrong? The sampling distribution of b cannot be both normal and skewed, at the same time.
Thanks,
Dave Howell
David C. Howell
Professor Emeritus
University of Vermont
New address:Professor Emeritus
University of Vermont
David C. Howell
3007 Barton Point Circle
Austin, Tx 78733
http://www.uvm.edu/~dhowell/StatPages/StatHomePage.html
http://www.uvm.edu/~dhowell/gradstat/index.html
