Thanks for your very helpful reply.

Donald Burrill wrote:
<snip>

> 
> Here you speak of estimating a mean (or median), and of efficiency
> in
> estimating that mean (or median).  But if THAT's all you want to do,
> why
> go to the bother of regression analysis?  The usual univariate
> estimates
> (sample mean, sample median) will do as well.  A regression, of
> whatever stripe, is used to estimate the parameters of a
> mathematical model
> relating the response variable to the predictor(s).  For me to try
> to interpret your paragraph in a way that would make sense, would be
> for me
> to put words in your mouth.  Better for you to do that, especially
> as doing so may help clarify your thought on all this.

You're right.  Not a well written question.  To clarify: suppose I 
have data such that for each value of the independent variable x, y 
takes on values that are normally distributed with mean a*x+b (e.g. 
E[Y|X=x] = a*x+b for constants a and b).  Now suppose we do a 
regression to determine the values of a and b (of course if we knew 
that the data were exactly normally distributed, etc. we could simply 
estimate means of y for 2 value of x, etc. but let's just assume 
we're doing a regular regression).  Then if we did one regression 
using mean squared errors and one using absolute errors the 2 
different lines would converge to the same exact line as the sample 
size goes to infinity.  That is to say both types of regression 
estimate the mean of Y at each value of X=x.  However, as the sample 
size increases, it is my understanding that the squared error 
estimate of the line would approach the "correct" line faster than 
the absolute error version would.  

In other words, suppose I have a process which first generates a 
random x value and then generates a random Y value randomly, but 
according to a normal dist. with mean given by a*x+b (a computer 
could easily be programmed to do this for example (pseudo-randomness 
aside)).  The true values of a and b are unknown to me.  After each 
new X,Y pair is generated I perform 2 regressions using all the data 
that have been generated so far.  The first regression minimizes 
squared errors and the 2nd minimizes absolute errors.  My contention 
is that both regressions will get closer and closer to the line a*x+b 
used to generate the data, but the squared error estimate will 
converge to the a*x+b line faster.

I hope that's better worded.  Assuming you can understand what I'm 
saying, is it correct?
.
.
=================================================================
Instructions for joining and leaving this list, remarks about the
problem of INAPPROPRIATE MESSAGES, and archives are available at:
.                  http://jse.stat.ncsu.edu/                    .
=================================================================

Reply via email to