Re: [Numpy-discussion] Changing the return type of np.histogramdd

Ralf Gommers Wed, 25 Apr 2018 22:51:15 -0700

On Wed, Apr 25, 2018 at 10:07 PM, Eric Wieser <[email protected]>
wrote:


> what does that gain over having the user do something like result.astype()
>
> It means that the user can use integer weights without worrying about
> losing precision due to an intermediate float representation.
>
> It also means they can use higher precision values (np.longdouble) or
> complex weights.
>
None of that seems particularly important to be honest.

you’re emitting warnings for everyone
>
> When there’s a risk of precision loss, that seems like the responsible
> thing to do.
>
For precision loss of the order of float64 eps, I disagree. There will be
many such places in numpy and in other core libraries.


> Users passing float weights would see no warning, I suppose.
>
> is this really worth a new function
>
> There ought to be a function for computing histograms with integer weights
> that doesn’t lose precision. Either we change the existing function to do
> that, or we make a new function.
>
It's also possible to refer users to scipy.stats.binned_statistic(_2d/dd),
which provides a superset of the histogram functionality and is internally
consistent because the implementations of 1d/2d call the dd one.

Ralf



> A possible compromise: like 1, but only change the dtype of the result if
> a weights argument is passed.
>
> #10864 <https://github.com/numpy/numpy/issues/10864> seems like a
> worrying design flaw too, but I suppose that can be dealt with separately.
>
> Eric
> 
>
> On Wed, 25 Apr 2018 at 21:57 Ralf Gommers <[email protected]> wrote:
>
>> On Mon, Apr 9, 2018 at 10:24 PM, Eric Wieser <[email protected]
>> > wrote:
>>
>>> Numpy has three histogram functions - histogram, histogram2d, and
>>> histogramdd.
>>>
>>> histogram is by far the most widely used, and in the absence of weights
>>> and normalization, returns an np.intp count for each bin.
>>>
>>> histogramdd (for which histogram2d is a wrapper) returns np.float64 in
>>> all circumstances.
>>>
>>> As a contrived comparison
>>>
>>> >>> x = np.linspace(0, 1)>>> h, e = np.histogram(x*x, bins=4); h
>>> array([25, 10,  8,  7], dtype=int64)>>> h, e = np.histogramdd((x*x,), 
>>> bins=4); h
>>> array([25., 10.,  8.,  7.])
>>>
>>> https://github.com/numpy/numpy/issues/7845 tracks this inconsistency.
>>>
>>> The fix is now trivial: the question is, will changing the return type
>>> break people’s code?
>>>
>>> Either we should:
>>>
>>>    1. Just change it, and hope no one is broken by it
>>>    2. Add a dtype argument:
>>>       - If dtype=None, behave like np.histogram
>>>       - If dtype is not specified, emit a future warning recommending
>>>       to use dtype=None or dtype=float
>>>       - In future, change the default to None
>>>    3. Create a new better-named function histogram_nd, which can also
>>>    be created without the mistake that is https://github.com/numpy/
>>>    numpy/issues/10864.
>>>
>>> Thoughts?
>>>
>>
>> (1)  sems like a no-go, taking such risks isn't justified by a minor
>> inconsistency.
>>
>> (2) is still fairly intrusive, you're emitting warnings for everyone and
>> still force people to change their code (and if they don't they may run
>> into a backwards compat break).
>>
>> (3) is the best of these options, however is this really worth a new
>> function? My vote would be "do nothing".
>>
>> Ralf
>>
>> _______________________________________________
>> NumPy-Discussion mailing list
>> [email protected]
>> https://mail.python.org/mailman/listinfo/numpy-discussion
>>
>
> _______________________________________________
> NumPy-Discussion mailing list
> [email protected]
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
>

_______________________________________________
NumPy-Discussion mailing list
[email protected]
https://mail.python.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Changing the return type of np.histogramdd

Reply via email to