On Jul 15, 2013 11:47 AM, "Charles R Harris" <charlesr.har...@gmail.com> wrote:
> > > On Mon, Jul 15, 2013 at 8:58 AM, Charles R Harris < > charlesr.har...@gmail.com> wrote: > >> >> >> On Mon, Jul 15, 2013 at 8:34 AM, Sebastian Berg < >> sebast...@sipsolutions.net> wrote: >> >>> On Mon, 2013-07-15 at 07:52 -0600, Charles R Harris wrote: >>> > >>> > >>> > On Sun, Jul 14, 2013 at 3:35 PM, Charles R Harris >>> > <charlesr.har...@gmail.com> wrote: >>> > >>> >>> <snip> >>> >>> > >>> > For nansum, I would expect 0 even in the case of all >>> > nans. The point >>> > of these functions is to simply ignore nans, correct? >>> > So I would aim >>> > for this behaviour: nanfunc(x) behaves the same as >>> > func(x[~isnan(x)]) >>> > >>> > >>> > Agreed, although that changes current behavior. What about the >>> > other cases? >>> > >>> > >>> > >>> > Looks like there isn't much interest in the topic, so I'll just go >>> > ahead with the following choices: >>> > >>> > Non-NaN case >>> > >>> > 1) Empty array -> ValueError >>> > >>> > The current behavior with stats is an accident, i.e., the nan arises >>> > from 0/0. I like to think that in this case the result is any number, >>> > rather than not a number, so *the* value is simply not defined. So in >>> > this case raise a ValueError for empty array. >>> > >>> To be honest, I don't mind the current behaviour much sum([]) = 0, >>> len([]) = 0, so it is in a way well defined. At least I am not sure if I >>> would prefer always an error. I am a bit worried that just changing it >>> might break code out there, such as plotting code where it makes >>> perfectly sense to plot a NaN (i.e. nothing), but if that is the case it >>> would probably be visible fast. >>> >>> > 2) ddof >= n -> ValueError >>> > >>> > If the number of elements, n, is not zero and ddof >= n, raise a >>> > ValueError for the ddof value. >>> > >>> Makes sense to me, especially for ddof > n. Just returning nan in all >>> cases for backward compatibility would be fine with me too. >>> >> >> Currently if ddof > n it returns a negative number for variance, the NaN >> only comes when ddof == 0 and n == 0, leading to 0/0 (float is NaN, integer >> is zero division). >> >> >>> >>> > Nan case >>> > >>> > 1) Empty array -> Value Error >>> > 2) Empty slice -> NaN >>> > 3) For slice ddof >= n -> Nan >>> > >>> Personally I would somewhat prefer if 1) and 2) would at least default >>> to the same thing. But I don't use the nanfuncs anyway. I was wondering >>> about adding the option for the user to pick what the fill is (and i.e. >>> if it is None (maybe default) -> ValueError). We could also allow this >>> for normal reductions without an identity, but I am not sure if it is >>> useful there. >>> >> >> In the NaN case some slices may be empty, others not. My reasoning is >> that that is going to be data dependent, not operator error, but if the >> array is empty the writer of the code should deal with that. >> >> > In the case of the nanvar, nanstd, it might make more sense to handle ddof > as > > 1) if ddof is >= axis size, raise ValueError > 2) if ddof is >= number of values after removing NaNs, return NaN > > The first would be consistent with the non-nan case, the second accounts > for the variable nature of data containing NaNs. > > Chuck > > > I think this is a good idea in that it naturally follows well with the conventions of what to do with empty arrays / empty slices with nanmean, etc. Note, however, I am not a very big fan of the idea of having two different behaviors for what I see as semantically the same thing. But, my objections are not strong enough to veto it, and I do think this proposal is well thought-out. Ben Root
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion