Re: [Numpy-discussion] var bias reason?

Bruce Southey Wed, 15 Oct 2008 09:25:45 -0700

Hi,

While I disagree, I really do not care because this is documented. Butperhaps a clear warning is need at the start so it clear what thedefault ddof means instead of it being buried in the Notes section.

Also I am surprised that you did not directly reference the Steinestimator (your minimum mean-squared estimator) and known effects inyour paper:

http://en.wikipedia.org/wiki/James-Stein_estimator

So I did not find thiss any different from what is already known aboutthe Stein estimator.


Bruce

PS While I may have gotten access via my University, I did get it fromthe link *Access this item.<https://contentdm.lib.byu.edu/cgi-bin/showfile.exe?CISOROOT=/EER&CISOPTR=134&filename=135.pdf>

https://contentdm.lib.byu.edu/cgi-bin/showfile.exe?CISOROOT=/EER&CISOPTR=134&filename=135.pdf
*
Travis E. Oliphant wrote:

Gabriel Gellner wrote:
Some colleagues noticed that var uses biased formula's by default in numpy,
searching for the reason only brought up:

http://article.gmane.org/gmane.comp.python.numeric.general/12438/match=var+bias

which I totally agree with, but there was no response? Any reason for this?
I will try to respond to this as it was me who made the change. I thinkthere have been responses, but I think I've preferred to stay quietrather than feed a flame war. Ultimately, it is a matter of preferenceand I don't think there would be equal weights given to all thearguments surrounding the decision by everybody.
I will attempt to articulate my reasons: dividing by n is the maximumlikelihood estimator of variance and I prefer that justification morethan the "un-biased" justification for a default (especially given thatbias is just one part of the "error" in an estimator). Having everypackage that computes the mean return the "un-biased" estimate gives itmore cultural weight than than the concept deserves, I think. Anysurprise that is created by the different default should be mitigated bythe fact that it's an opportunity to learn something about what you aredoing. Here is a paper I wrote on the subject that you might finduseful:
https://contentdm.lib.byu.edu/cdm4/item_viewer.php?CISOROOT=/EER&CISOPTR=134&CISOBOX=1&REC=1
(Hopefully, they will resolve a link problem at the above site soon, butyou can read the abstract).
I'm not trying to persuade anybody with this email (although if you candownload the paper at the above link, then I am trying to persuade withthat). In this email I'm just trying to give context to the poster as Ithink the question is legitimate.
With that said, there is the ddof parameter so that you can change whatthe divisor is. I think that is a useful compromise.
I'm unhappy with the internal inconsistency of cov, as I think it was anoversight. I'd be happy to see cov changed as well to use the ddofargument instead of the bias keyword, but that is an API change andrequires some transition discussion and work.
The only other argument I've heard against the current situation is"unit testing" with MATLAB or R code. Just use ddof=1 when comparingagainst MATLAB and R code is my suggestion.
Best regards,

-Travis

_______________________________________________
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion

_______________________________________________
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] var bias reason?

Reply via email to