On 29-Nov-09 20:15 PM, Robin wrote: > On Mon, Nov 30, 2009 at 12:30 AM, Colin J. Williams<c...@ncf.ca> wrote: > >> On 29-Nov-09 17:13 PM, Dr. Phillip M. Feldman wrote: >> >>> All of the statistical packages that I am currently using and have used in >>> the past (Matlab, Minitab, R, S-plus) calculate standard deviation using the >>> sqrt(1/(n-1)) normalization, which gives a result that is unbiased when >>> sampling from a normally-distributed population. NumPy uses the sqrt(1/n) >>> normalization. I'm currently using the following code to calculate standard >>> deviations, but would much prefer if this could be fixed in NumPy itself: >>> >>> def mystd(x=numpy.array([]), axis=None): >>> """This function calculates the standard deviation of the input using >>> the >>> definition of standard deviation that gives an unbiased result for >>> samples >>> from a normally-distributed population.""" >>> >>> xd= x - x.mean(axis=axis) >>> return sqrt( (xd*xd).sum(axis=axis) / (numpy.size(x,axis=axis)-1.0) ) >>> >>> >> Anne Archibald has suggested a work-around. Perhaps ddof could be set, >> by default to >> 1 as other values are rarely required. >> >> Where the distribution of a variate is not known a priori, then I >> believe that it can be shown >> that the n-1 divisor provides the best estimate of the variance. >> > There have been previous discussions on this (but I can't find them > now) and I believe the current default was chosen deliberately. I > think it is the view of the numpy developers that the n divisor has > more desireable properties in most cases than the traditional n-1 - > see this paper by Travis Oliphant for details: > http://hdl.handle.net/1877/438 > > Cheers > > Robin > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion@scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > The conventional approach, based in the notion of Expected values is given here: http://en.wikipedia.org/wiki/Variance#Distribution_of_the_sample_variance
I would suggest that numpy should stick with that until the approach advocated in: http://hdl.handle.net/1877/438 is generally accepted. Thomas Bayes introduced some nebulous ideas that might not be relevant for most cases when one is trying to find a confidence interval for a mean: http://en.wikipedia.org/wiki/Thomas_bayes#Bayes.27_theorem Colin W. _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion