Re: [Numpy-discussion] np.{bool,float,int} deprecation

Ralf Gommers Fri, 11 Dec 2020 02:29:55 -0800

On Fri, Dec 11, 2020 at 9:47 AM Eric Wieser <[email protected]>
wrote:


> >  you might want to discuss this with us at the array API standard
> > https://github.com/data-apis/array-api (which is currently in RFC
> > stage). The spec uses bool as the name for the boolean dtype.
>
> I don't fully understand this argument - `np.bool` is already not the
> boolean dtype. Either:
>
> * The spec is suggesting that `pkg.bool` be some arbitrary object that can
> be passed into a dtype argument and will produce a boolean array.
>   If this is the case, the spec could also just require that
> `dtype=builtins.bool` have this behavior.
>

Yes, this.

* The spec is suggesting that `pkg.bool` is some rich dtype object.
>   Ignoring the question of whether this should be `np.bool_` or
> `np.dtype(np.bool_)`, it's currently neither, and changing it will break
> users relying on `np.bool(True) is True`.
>   That's not to say this isn't a sensible thing for the specification to
> have, it's just something that numpy can't conform to without breaking code.
>

It can have richer behaviour, there's no constraints there - but it's not
necessary.


> While it would be great if `np.bool_` could be spelt `np.bool`, I really
> don't think we can make that change without a long deprecation first (if at
> all).
>

Given that that standard API would be in a new namespace (given backwards
compat we can't possibly introduce it in the main namespace), there `bool`
can be the numpy boolean dtype (if desired).

The key point is that `bool_` is a terrible name, and keeping `np.bool`
that you can use as a dtype specifier is desirable.

Cheers,
Ralf


> Eric
>
> On Thu, 10 Dec 2020 at 20:00, Sebastian Berg <[email protected]>
> wrote:
>
>> On Thu, 2020-12-10 at 20:38 +0100, Ralf Gommers wrote:
>> > On Thu, Dec 10, 2020 at 7:25 PM Sebastian Berg <
>> > [email protected]>
>> > wrote:
>> >
>> > > On Wed, 2020-12-09 at 13:37 -0800, Stephan Hoyer wrote:
>> > > > On Wed, Dec 9, 2020 at 1:07 PM Aaron Meurer <[email protected]>
>> > > > wrote:
>> > > >
>> > > > > On Wed, Dec 9, 2020 at 9:41 AM Sebastian Berg
>> > > > > <[email protected]> wrote:
>> > > > > >
>> > > > > > On Mon, 2020-12-07 at 14:18 -0700, Aaron Meurer wrote:
>> > > > > > > Regarding np.bool specifically, if you want to deprecate
>> > > > > > > this,
>> > > > > > > you
>> > > > > > > might want to discuss this with us at the array API
>> > > > > > > standard
>> > > > > > > https://github.com/data-apis/array-api (which is currently
>> > > > > > > in
>> > > > > > > RFC
>> > > > > > > stage). The spec uses bool as the name for the boolean
>> > > > > > > dtype.
>> > > > > > >
>> > > > > > > Would it make sense for NumPy to change np.bool to just be
>> > > > > > > the
>> > > > > > > boolean
>> > > > > > > dtype object? Unlike int and float, there is no ambiguity
>> > > > > > > with
>> > > > > > > bool,
>> > > > > > > and NumPy clearly doesn't have any issues with shadowing
>> > > > > > > builtin
>> > > > > > > names
>> > > > > > > in its namespace.
>> > > > > >
>> > > > > > We could keep the Python alias around (which for `dtype=` is
>> > > > > > the
>> > > > > > same
>> > > > > > as `np.bool_`).
>> > > > > >
>> > > > > > I am not sure I like the idea of immediately shadowing the
>> > > > > > builtin.
>> > > > > > That is a switch we can avoid flipping (without warning);
>> > > > > > `np.bool_`
>> > > > > > and `bool` are fairly different beasts? [1]
>> > > > >
>> > > > > NumPy already shadows a lot of builtins, in many cases, in ways
>> > > > > that
>> > > > > are incompatible with existing ones. It's not something I would
>> > > > > have
>> > > > > done personally, but it's been this way for a long time.
>> > > > >
>> > > >
>> > > > It may be defensible to keep np.bool as an alias for Python's
>> > > > bool
>> > > > even when we remove the other aliases.
>> > >
>> >
>> > I'd agree with that.
>> >
>> >
>> > > That is true, `int` is probably the most confusing, since it is not
>> > > at
>> > > all compatible to a Python integer, but rather the "default"
>> > > integer
>> > > (which happens to be the same as C `long` currently).
>> > >
>> > > So we could focus on `np.int`, `np.long`.  I am a bit unsure
>> > > whether
>> > > you would prefer that or are mainly pointing out the possibility?
>> > >
>> >
>> > Not sure what you mean with focus, focus on describing in the release
>> > notes? Deprecating `np.int` seems like the most beneficial part of
>> > this
>> > whole exercise.
>> >
>>
>> I meant limiting the current deprecation to `np.int`, maybe `np.long`,
>> and a "carefully chosen" set.
>> To be honest, I don't mind either way, so any stronger opinion will tip
>> the scale for me personally (my default currently is to update the
>> release notes to recommend the more descriptive names).
>>
>> There are probably more doc updates that would be nice, I will suggest
>> updating a separate issue for that.
>>
>>
>> > Right now, my main take-away from the discussion is that it would be
>> > > good to clarify the release notes a bit more.
>> > >
>> > > Using `float` for a dtype seems fine to me, but I prefer mentioning
>> > > `np.float64` over `np.float_`.
>> > > For integers, I wonder if we should also suggest `np.int64`, even –
>> > > or
>> > > because – if the default integer on many systems is currently
>> > > `np.int_`?
>> > >
>> >
>> > I agree. I think we should recommend sane, descriptive names that do
>> > the
>> > right thing. So ideally we'd have people spell their dtype specifiers
>> > as
>> >   dtype=bool  # or np.bool
>> >   dtype=np.float64
>> >   dtype=np.int64
>> >   dtype=np.complex128
>> > The names with underscores at the end make little sense from a UX
>> > perspective. And the C equivalents (single/double/etc) made sense 15
>> > years
>> > ago, but with the user base of today - the majority of whom will not
>> > know C
>> > fluently or at all - also don't make too much sense.
>> >
>> > The `dtype=int` or `dtype=np.int_` behaviour flopping between 32 and
>> > 64
>> > bits is likely to be a pitfall much more often than it is what the
>> > user
>> > actually needs, so shouldn't be recommended and probably deserves a
>> > warning
>> > in the docs.
>>
>> Right, there is one slight trickery because `np.intp` is often a great
>> integer dtype to use, because it is the integer that NumPy uses for all
>> things related to indexing and array sizes.
>> (I would be happy to dig out my PR making `np.intp` the default NumPy
>> integer.)
>>
>> Cheers,
>>
>> Sebastian
>>
>>
>> >
>> > Cheers,
>> > Ralf
>> >
>> >
>> > >
>> > > >
>> > > > np.int_ and np.float_ have fixed precision, which makes them
>> > > > somewhat
>> > > > different from the builtin types. NumPy has a whole bunch of
>> > > > different
>> > > > precisions for integer and floats, so this distinction matters.
>> > > >
>> > > > In contrast, there is only one boolean dtype in NumPy, which
>> > > > matches
>> > > > Python's bool. So we wouldn't have to worry, for example, about
>> > > > whether a
>> > > > user has requested a specific precision explicitly. This comes up
>> > > > in
>> > > > issues
>> > > > like type-promotion where libraries like JAX and PyTorch have
>> > > > special
>> > > > case
>> > > > logic for most Python types vs NumPy dtypes (but booleans are the
>> > > > same for
>> > > > both):
>> > > > https://jax.readthedocs.io/en/latest/type_promotion.html
>> > >
>> > >
>> > _______________________________________________
>> > NumPy-Discussion mailing list
>> > [email protected]
>> > https://mail.python.org/mailman/listinfo/numpy-discussion
>>
>> _______________________________________________
>> NumPy-Discussion mailing list
>> [email protected]
>> https://mail.python.org/mailman/listinfo/numpy-discussion
>>
> _______________________________________________
> NumPy-Discussion mailing list
> [email protected]
> https://mail.python.org/mailman/listinfo/numpy-discussion
>

_______________________________________________
NumPy-Discussion mailing list
[email protected]
https://mail.python.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] np.{bool,float,int} deprecation

Reply via email to