On Wed, 2012-11-21 at 22:58 -0500, [email protected] wrote: > On Wed, Nov 21, 2012 at 10:35 PM, Charles R Harris > <[email protected]> wrote: > > > > > > On Wed, Nov 21, 2012 at 7:45 PM, <[email protected]> wrote: > >> > >> On Wed, Nov 21, 2012 at 9:22 PM, Olivier Delalleau <[email protected]> wrote: > >> > Current behavior looks sensible to me. I personally would prefer no > >> > warning > >> > but I think it makes sense to have one as it can be helpful to detect > >> > issues > >> > faster. > >> > >> I agree that nan should be the correct answer. > >> (I gave up trying to define a default for 0/0 in scipy.stats ttests.) > >> > >> some funnier cases > >> > >> >>> np.var([1], ddof=1) > >> 0.0 > > > > > > This one is a nan in development. > > > >> > >> >>> np.var([1], ddof=5) > >> -0 > >> >>> np.var([1,2], ddof=5) > >> -0.16666666666666666 > >> >>> np.std([1,2], ddof=5) > >> nan > >> > > > > These still do this. Also > > > > In [10]: var([], ddof=1) > > Out[10]: -0 > > > > Which suggests that the nan is pretty much an accidental byproduct of > > division by zero. I think it might make sense to have a definite policy for > > these corner cases. > > It would also be consistent with the usual pattern to raise a > ValueError on this. ddof too large, size too small. > It wouldn't be the case that for some columns or rows we get valid > answers in this case, as long as we don't allow for missing values. >
It seems to me that nan is the reasonable result for these operations (reduce like operations that do not have an identity). Though actually reduce operations without an identity throw a ValueError (ie. `np.minimum.reduce([])`), but then mean/std/var seem special enough to be different from other reduce operations (for example their result is always floating point). As for usability I think for example when plotting errorbars using std, it would be rather annoying to get a ValueError, so if anything the reduce machinery could give more special results for empty floating point reductions. In any case the warning should be clearer and for too large ddof's I would say it should return nan+Warning as well. Sebastian > > quick check with np.ma > > looks correct except when delegating to numpy ? > > >>> s = np.ma.var(np.ma.masked_invalid([[1.,2],[1,np.nan]]), ddof=5, axis=0) > >>> s > masked_array(data = [-- --], > mask = [ True True], > fill_value = 1e+20) > > >>> s = np.ma.var(np.ma.masked_invalid([[1.,2],[1,np.nan]]), ddof=1, axis=0) > >>> s > masked_array(data = [0.0 --], > mask = [False True], > fill_value = 1e+20) > > >>> s = np.ma.std([1,2], ddof=5) > >>> s > masked > >>> type(s) > <class 'numpy.ma.core.MaskedConstant'> > > >>> np.ma.var([1,2], ddof=5) > -0.16666666666666666 > > > Josef > > > > > <snip> > > > > Chuck > > > > > > _______________________________________________ > > NumPy-Discussion mailing list > > [email protected] > > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > _______________________________________________ > NumPy-Discussion mailing list > [email protected] > http://mail.scipy.org/mailman/listinfo/numpy-discussion > _______________________________________________ NumPy-Discussion mailing list [email protected] http://mail.scipy.org/mailman/listinfo/numpy-discussion
