Hi,
While I disagree, I really do not care because this is documented. But perhaps a clear warning is need at the start so it clear what the default ddof means instead of it being buried in the Notes section.

Also I am surprised that you did not directly reference the Stein estimator (your minimum mean-squared estimator) and known effects in your paper:
http://en.wikipedia.org/wiki/James-Stein_estimator
So I did not find thiss any different from what is already known about the Stein estimator.

Bruce

PS While I may have gotten access via my University, I did get it from the link *Access this item. <https://contentdm.lib.byu.edu/cgi-bin/showfile.exe?CISOROOT=/EER&CISOPTR=134&filename=135.pdf>
https://contentdm.lib.byu.edu/cgi-bin/showfile.exe?CISOROOT=/EER&CISOPTR=134&filename=135.pdf
*
Travis E. Oliphant wrote:
Gabriel Gellner wrote:
Some colleagues noticed that var uses biased formula's by default in numpy,
searching for the reason only brought up:

http://article.gmane.org/gmane.comp.python.numeric.general/12438/match=var+bias

which I totally agree with, but there was no response? Any reason for this?
I will try to respond to this as it was me who made the change. I think there have been responses, but I think I've preferred to stay quiet rather than feed a flame war. Ultimately, it is a matter of preference and I don't think there would be equal weights given to all the arguments surrounding the decision by everybody.

I will attempt to articulate my reasons: dividing by n is the maximum likelihood estimator of variance and I prefer that justification more than the "un-biased" justification for a default (especially given that bias is just one part of the "error" in an estimator). Having every package that computes the mean return the "un-biased" estimate gives it more cultural weight than than the concept deserves, I think. Any surprise that is created by the different default should be mitigated by the fact that it's an opportunity to learn something about what you are doing. Here is a paper I wrote on the subject that you might find useful:

https://contentdm.lib.byu.edu/cdm4/item_viewer.php?CISOROOT=/EER&CISOPTR=134&CISOBOX=1&REC=1
(Hopefully, they will resolve a link problem at the above site soon, but you can read the abstract).

I'm not trying to persuade anybody with this email (although if you can download the paper at the above link, then I am trying to persuade with that). In this email I'm just trying to give context to the poster as I think the question is legitimate.

With that said, there is the ddof parameter so that you can change what the divisor is. I think that is a useful compromise.

I'm unhappy with the internal inconsistency of cov, as I think it was an oversight. I'd be happy to see cov changed as well to use the ddof argument instead of the bias keyword, but that is an API change and requires some transition discussion and work.

The only other argument I've heard against the current situation is "unit testing" with MATLAB or R code. Just use ddof=1 when comparing against MATLAB and R code is my suggestion.

Best regards,

-Travis

_______________________________________________
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


_______________________________________________
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion

Reply via email to