On Wed, Apr 25, 2018 at 10:07 PM, Eric Wieser <wieser.eric+nu...@gmail.com> wrote:
> what does that gain over having the user do something like result.astype() > > It means that the user can use integer weights without worrying about > losing precision due to an intermediate float representation. > > It also means they can use higher precision values (np.longdouble) or > complex weights. > None of that seems particularly important to be honest. you’re emitting warnings for everyone > > When there’s a risk of precision loss, that seems like the responsible > thing to do. > For precision loss of the order of float64 eps, I disagree. There will be many such places in numpy and in other core libraries. > Users passing float weights would see no warning, I suppose. > > is this really worth a new function > > There ought to be a function for computing histograms with integer weights > that doesn’t lose precision. Either we change the existing function to do > that, or we make a new function. > It's also possible to refer users to scipy.stats.binned_statistic(_2d/dd), which provides a superset of the histogram functionality and is internally consistent because the implementations of 1d/2d call the dd one. Ralf > A possible compromise: like 1, but only change the dtype of the result if > a weights argument is passed. > > #10864 <https://github.com/numpy/numpy/issues/10864> seems like a > worrying design flaw too, but I suppose that can be dealt with separately. > > Eric > > > On Wed, 25 Apr 2018 at 21:57 Ralf Gommers <ralf.gomm...@gmail.com> wrote: > >> On Mon, Apr 9, 2018 at 10:24 PM, Eric Wieser <wieser.eric+nu...@gmail.com >> > wrote: >> >>> Numpy has three histogram functions - histogram, histogram2d, and >>> histogramdd. >>> >>> histogram is by far the most widely used, and in the absence of weights >>> and normalization, returns an np.intp count for each bin. >>> >>> histogramdd (for which histogram2d is a wrapper) returns np.float64 in >>> all circumstances. >>> >>> As a contrived comparison >>> >>> >>> x = np.linspace(0, 1)>>> h, e = np.histogram(x*x, bins=4); h >>> array([25, 10, 8, 7], dtype=int64)>>> h, e = np.histogramdd((x*x,), >>> bins=4); h >>> array([25., 10., 8., 7.]) >>> >>> https://github.com/numpy/numpy/issues/7845 tracks this inconsistency. >>> >>> The fix is now trivial: the question is, will changing the return type >>> break people’s code? >>> >>> Either we should: >>> >>> 1. Just change it, and hope no one is broken by it >>> 2. Add a dtype argument: >>> - If dtype=None, behave like np.histogram >>> - If dtype is not specified, emit a future warning recommending >>> to use dtype=None or dtype=float >>> - In future, change the default to None >>> 3. Create a new better-named function histogram_nd, which can also >>> be created without the mistake that is https://github.com/numpy/ >>> numpy/issues/10864. >>> >>> Thoughts? >>> >> >> (1) sems like a no-go, taking such risks isn't justified by a minor >> inconsistency. >> >> (2) is still fairly intrusive, you're emitting warnings for everyone and >> still force people to change their code (and if they don't they may run >> into a backwards compat break). >> >> (3) is the best of these options, however is this really worth a new >> function? My vote would be "do nothing". >> >> Ralf >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion@python.org >> https://mail.python.org/mailman/listinfo/numpy-discussion >> > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion@python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > >
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion