On Mon, Apr 9, 2018 at 10:24 PM, Eric Wieser <wieser.eric+nu...@gmail.com> wrote:
> Numpy has three histogram functions - histogram, histogram2d, and > histogramdd. > > histogram is by far the most widely used, and in the absence of weights > and normalization, returns an np.intp count for each bin. > > histogramdd (for which histogram2d is a wrapper) returns np.float64 in > all circumstances. > > As a contrived comparison > > >>> x = np.linspace(0, 1)>>> h, e = np.histogram(x*x, bins=4); h > array([25, 10, 8, 7], dtype=int64)>>> h, e = np.histogramdd((x*x,), > bins=4); h > array([25., 10., 8., 7.]) > > https://github.com/numpy/numpy/issues/7845 tracks this inconsistency. > > The fix is now trivial: the question is, will changing the return type > break people’s code? > > Either we should: > > 1. Just change it, and hope no one is broken by it > 2. Add a dtype argument: > - If dtype=None, behave like np.histogram > - If dtype is not specified, emit a future warning recommending to > use dtype=None or dtype=float > - In future, change the default to None > 3. Create a new better-named function histogram_nd, which can also be > created without the mistake that is https://github.com/numpy/ > numpy/issues/10864. > > Thoughts? > (1) sems like a no-go, taking such risks isn't justified by a minor inconsistency. (2) is still fairly intrusive, you're emitting warnings for everyone and still force people to change their code (and if they don't they may run into a backwards compat break). (3) is the best of these options, however is this really worth a new function? My vote would be "do nothing". Ralf
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion