Andrew Nikitin wrote:
> John Randall wrote:
>
>>The denominator of the sample mean really should be n, and for the
>>sample variance it really should be n-1 if you are estimating the
>>population mean from the sample.
>
> "Should" is such a strong word, i do not think it is justified here.
> Besides you dropped the disclaimer "if you want unbiased estimation of
> population variance"
>
> What i mean is that unbiasedness is not sacred. For example assiming
> normal
> xi,
> E((1/n-1)\sum (X_i-\bar X)^2 gives unbiased estimate, but E(1/n\sum
> (X_i-\bar X)^2 gives another estimate, which though is biased,
> nevertheless
> is optimal in least square sense. I am not saying that least square
> criteria
> is more sacred than unbiasedness, but nor is other way around.
>
Of course unbiasedness is not sacred, but it is the simple answer to
the question "why is the denominator n-1 instead of n?". If you just
want something that tells you the deviation, and your sample size is
fixed, you can have any denominator. If you are comparing variances of
different sized samples, for example in partitioning sums of squares,
it becomes significant.
I don't get your point about optimality in the least squares sense.
If you are talking about curve fitting or regression, the minimization
is different. You have data {(x_i,y_i)} and you assume all the error
is in the y direction. You have a parameterized model f, and you
minimize \sum (y_i-f(x_i))^2 over all parameters of the model. For
example, for least squares straight line fitting, you let f(x)=a+bx,
and minimize the sum of squares with respect to a and b. I don't see
where the value of the denominator is relevant: you get the same
result for any denominator. If you are partitioning the sum of
squares, you are back to the previous point.
> Bottomline is: one cannot just plug numbers into formula, one must know
> what
> she is doing.
I completely agree. I would add: one cannot just run analyses using
SPSS or some other program. The most common errors are ignoring the
distribution of the test statistic (e.g. assuming everything is
normal) or assuming that categorical data is ranked or linear.
Best wishes,
John
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm