Re: [R] A basic statistics question

peter dalgaard Fri, 15 Aug 2014 05:43:19 -0700

On 13 Aug 2014, at 20:49 , (Ted Harding) <ted.hard...@wlandres.net> wrote:


> Indeed, this topic has got me wondering how many times I may have
> blindly used sd(x) in the past, as if it were going to give me the
> standard (sum(x - mean(x))^2)/length(x) result!


At the risk of flogging a horse that has been dead for the better part of a 
century, I don't think there is anything "standard" about an SD with a divisor 
of N, and the biasedness of the version with N-1 divisor is not really the 
crucial issue. Rather, the distinction is between 

- one sample from a known finite distribution
- multiple samples from an unknown distribution

and in particular between whether the mean is estimated or known. 

One argument for the N-1 divisor in the normal case is that you can transform 
data to one observation with unknown mean and N-1 independent observations with 
mean known to be 0. The variance estimate will be a function of the N-1 
variables, and thus there is no reason to let the mere existence of the 
uninformative Nth variable change the estimator.

Of course few people really care about N vs. N-1 but in larger linear models, 
it becomes N-p and p can be a sizeable fraction of N.  

-- 
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Email: pd....@cbs.dk  Priv: pda...@gmail.com

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] A basic statistics question

Reply via email to