Hi,
While I disagree, I really do not care because this is documented. But
perhaps a clear warning is need at the start so it clear what the
default ddof means instead of it being buried in the Notes section.
Also I am surprised that you did not directly reference the Stein
estimator (your minimum mean-squared estimator) and known effects in
your paper:
http://en.wikipedia.org/wiki/James-Stein_estimator
So I did not find thiss any different from what is already known about
the Stein estimator.
Bruce
PS While I may have gotten access via my University, I did get it from
the link *Access this item.
<https://contentdm.lib.byu.edu/cgi-bin/showfile.exe?CISOROOT=/EER&CISOPTR=134&filename=135.pdf>
https://contentdm.lib.byu.edu/cgi-bin/showfile.exe?CISOROOT=/EER&CISOPTR=134&filename=135.pdf
*
Travis E. Oliphant wrote:
Gabriel Gellner wrote:
Some colleagues noticed that var uses biased formula's by default in numpy,
searching for the reason only brought up:
http://article.gmane.org/gmane.comp.python.numeric.general/12438/match=var+bias
which I totally agree with, but there was no response? Any reason for this?
I will try to respond to this as it was me who made the change. I think
there have been responses, but I think I've preferred to stay quiet
rather than feed a flame war. Ultimately, it is a matter of preference
and I don't think there would be equal weights given to all the
arguments surrounding the decision by everybody.
I will attempt to articulate my reasons: dividing by n is the maximum
likelihood estimator of variance and I prefer that justification more
than the "un-biased" justification for a default (especially given that
bias is just one part of the "error" in an estimator). Having every
package that computes the mean return the "un-biased" estimate gives it
more cultural weight than than the concept deserves, I think. Any
surprise that is created by the different default should be mitigated by
the fact that it's an opportunity to learn something about what you are
doing. Here is a paper I wrote on the subject that you might find
useful:
https://contentdm.lib.byu.edu/cdm4/item_viewer.php?CISOROOT=/EER&CISOPTR=134&CISOBOX=1&REC=1
(Hopefully, they will resolve a link problem at the above site soon, but
you can read the abstract).
I'm not trying to persuade anybody with this email (although if you can
download the paper at the above link, then I am trying to persuade with
that). In this email I'm just trying to give context to the poster as I
think the question is legitimate.
With that said, there is the ddof parameter so that you can change what
the divisor is. I think that is a useful compromise.
I'm unhappy with the internal inconsistency of cov, as I think it was an
oversight. I'd be happy to see cov changed as well to use the ddof
argument instead of the bias keyword, but that is an API change and
requires some transition discussion and work.
The only other argument I've heard against the current situation is
"unit testing" with MATLAB or R code. Just use ddof=1 when comparing
against MATLAB and R code is my suggestion.
Best regards,
-Travis
_______________________________________________
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion
_______________________________________________
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion