Thanks for your very helpful reply. Donald Burrill wrote: <snip>
> > Here you speak of estimating a mean (or median), and of efficiency > in > estimating that mean (or median). But if THAT's all you want to do, > why > go to the bother of regression analysis? The usual univariate > estimates > (sample mean, sample median) will do as well. A regression, of > whatever stripe, is used to estimate the parameters of a > mathematical model > relating the response variable to the predictor(s). For me to try > to interpret your paragraph in a way that would make sense, would be > for me > to put words in your mouth. Better for you to do that, especially > as doing so may help clarify your thought on all this. You're right. Not a well written question. To clarify: suppose I have data such that for each value of the independent variable x, y takes on values that are normally distributed with mean a*x+b (e.g. E[Y|X=x] = a*x+b for constants a and b). Now suppose we do a regression to determine the values of a and b (of course if we knew that the data were exactly normally distributed, etc. we could simply estimate means of y for 2 value of x, etc. but let's just assume we're doing a regular regression). Then if we did one regression using mean squared errors and one using absolute errors the 2 different lines would converge to the same exact line as the sample size goes to infinity. That is to say both types of regression estimate the mean of Y at each value of X=x. However, as the sample size increases, it is my understanding that the squared error estimate of the line would approach the "correct" line faster than the absolute error version would. In other words, suppose I have a process which first generates a random x value and then generates a random Y value randomly, but according to a normal dist. with mean given by a*x+b (a computer could easily be programmed to do this for example (pseudo-randomness aside)). The true values of a and b are unknown to me. After each new X,Y pair is generated I perform 2 regressions using all the data that have been generated so far. The first regression minimizes squared errors and the 2nd minimizes absolute errors. My contention is that both regressions will get closer and closer to the line a*x+b used to generate the data, but the squared error estimate will converge to the a*x+b line faster. I hope that's better worded. Assuming you can understand what I'm saying, is it correct? . . ================================================================= Instructions for joining and leaving this list, remarks about the problem of INAPPROPRIATE MESSAGES, and archives are available at: . http://jse.stat.ncsu.edu/ . =================================================================
