On Fri, Dec 11, 2020 at 9:47 AM Eric Wieser <wieser.eric+nu...@gmail.com> wrote:
> > you might want to discuss this with us at the array API standard > > https://github.com/data-apis/array-api (which is currently in RFC > > stage). The spec uses bool as the name for the boolean dtype. > > I don't fully understand this argument - `np.bool` is already not the > boolean dtype. Either: > > * The spec is suggesting that `pkg.bool` be some arbitrary object that can > be passed into a dtype argument and will produce a boolean array. > If this is the case, the spec could also just require that > `dtype=builtins.bool` have this behavior. > Yes, this. * The spec is suggesting that `pkg.bool` is some rich dtype object. > Ignoring the question of whether this should be `np.bool_` or > `np.dtype(np.bool_)`, it's currently neither, and changing it will break > users relying on `np.bool(True) is True`. > That's not to say this isn't a sensible thing for the specification to > have, it's just something that numpy can't conform to without breaking code. > It can have richer behaviour, there's no constraints there - but it's not necessary. > While it would be great if `np.bool_` could be spelt `np.bool`, I really > don't think we can make that change without a long deprecation first (if at > all). > Given that that standard API would be in a new namespace (given backwards compat we can't possibly introduce it in the main namespace), there `bool` can be the numpy boolean dtype (if desired). The key point is that `bool_` is a terrible name, and keeping `np.bool` that you can use as a dtype specifier is desirable. Cheers, Ralf > Eric > > On Thu, 10 Dec 2020 at 20:00, Sebastian Berg <sebast...@sipsolutions.net> > wrote: > >> On Thu, 2020-12-10 at 20:38 +0100, Ralf Gommers wrote: >> > On Thu, Dec 10, 2020 at 7:25 PM Sebastian Berg < >> > sebast...@sipsolutions.net> >> > wrote: >> > >> > > On Wed, 2020-12-09 at 13:37 -0800, Stephan Hoyer wrote: >> > > > On Wed, Dec 9, 2020 at 1:07 PM Aaron Meurer <asmeu...@gmail.com> >> > > > wrote: >> > > > >> > > > > On Wed, Dec 9, 2020 at 9:41 AM Sebastian Berg >> > > > > <sebast...@sipsolutions.net> wrote: >> > > > > > >> > > > > > On Mon, 2020-12-07 at 14:18 -0700, Aaron Meurer wrote: >> > > > > > > Regarding np.bool specifically, if you want to deprecate >> > > > > > > this, >> > > > > > > you >> > > > > > > might want to discuss this with us at the array API >> > > > > > > standard >> > > > > > > https://github.com/data-apis/array-api (which is currently >> > > > > > > in >> > > > > > > RFC >> > > > > > > stage). The spec uses bool as the name for the boolean >> > > > > > > dtype. >> > > > > > > >> > > > > > > Would it make sense for NumPy to change np.bool to just be >> > > > > > > the >> > > > > > > boolean >> > > > > > > dtype object? Unlike int and float, there is no ambiguity >> > > > > > > with >> > > > > > > bool, >> > > > > > > and NumPy clearly doesn't have any issues with shadowing >> > > > > > > builtin >> > > > > > > names >> > > > > > > in its namespace. >> > > > > > >> > > > > > We could keep the Python alias around (which for `dtype=` is >> > > > > > the >> > > > > > same >> > > > > > as `np.bool_`). >> > > > > > >> > > > > > I am not sure I like the idea of immediately shadowing the >> > > > > > builtin. >> > > > > > That is a switch we can avoid flipping (without warning); >> > > > > > `np.bool_` >> > > > > > and `bool` are fairly different beasts? [1] >> > > > > >> > > > > NumPy already shadows a lot of builtins, in many cases, in ways >> > > > > that >> > > > > are incompatible with existing ones. It's not something I would >> > > > > have >> > > > > done personally, but it's been this way for a long time. >> > > > > >> > > > >> > > > It may be defensible to keep np.bool as an alias for Python's >> > > > bool >> > > > even when we remove the other aliases. >> > > >> > >> > I'd agree with that. >> > >> > >> > > That is true, `int` is probably the most confusing, since it is not >> > > at >> > > all compatible to a Python integer, but rather the "default" >> > > integer >> > > (which happens to be the same as C `long` currently). >> > > >> > > So we could focus on `np.int`, `np.long`. I am a bit unsure >> > > whether >> > > you would prefer that or are mainly pointing out the possibility? >> > > >> > >> > Not sure what you mean with focus, focus on describing in the release >> > notes? Deprecating `np.int` seems like the most beneficial part of >> > this >> > whole exercise. >> > >> >> I meant limiting the current deprecation to `np.int`, maybe `np.long`, >> and a "carefully chosen" set. >> To be honest, I don't mind either way, so any stronger opinion will tip >> the scale for me personally (my default currently is to update the >> release notes to recommend the more descriptive names). >> >> There are probably more doc updates that would be nice, I will suggest >> updating a separate issue for that. >> >> >> > Right now, my main take-away from the discussion is that it would be >> > > good to clarify the release notes a bit more. >> > > >> > > Using `float` for a dtype seems fine to me, but I prefer mentioning >> > > `np.float64` over `np.float_`. >> > > For integers, I wonder if we should also suggest `np.int64`, even – >> > > or >> > > because – if the default integer on many systems is currently >> > > `np.int_`? >> > > >> > >> > I agree. I think we should recommend sane, descriptive names that do >> > the >> > right thing. So ideally we'd have people spell their dtype specifiers >> > as >> > dtype=bool # or np.bool >> > dtype=np.float64 >> > dtype=np.int64 >> > dtype=np.complex128 >> > The names with underscores at the end make little sense from a UX >> > perspective. And the C equivalents (single/double/etc) made sense 15 >> > years >> > ago, but with the user base of today - the majority of whom will not >> > know C >> > fluently or at all - also don't make too much sense. >> > >> > The `dtype=int` or `dtype=np.int_` behaviour flopping between 32 and >> > 64 >> > bits is likely to be a pitfall much more often than it is what the >> > user >> > actually needs, so shouldn't be recommended and probably deserves a >> > warning >> > in the docs. >> >> Right, there is one slight trickery because `np.intp` is often a great >> integer dtype to use, because it is the integer that NumPy uses for all >> things related to indexing and array sizes. >> (I would be happy to dig out my PR making `np.intp` the default NumPy >> integer.) >> >> Cheers, >> >> Sebastian >> >> >> > >> > Cheers, >> > Ralf >> > >> > >> > > >> > > > >> > > > np.int_ and np.float_ have fixed precision, which makes them >> > > > somewhat >> > > > different from the builtin types. NumPy has a whole bunch of >> > > > different >> > > > precisions for integer and floats, so this distinction matters. >> > > > >> > > > In contrast, there is only one boolean dtype in NumPy, which >> > > > matches >> > > > Python's bool. So we wouldn't have to worry, for example, about >> > > > whether a >> > > > user has requested a specific precision explicitly. This comes up >> > > > in >> > > > issues >> > > > like type-promotion where libraries like JAX and PyTorch have >> > > > special >> > > > case >> > > > logic for most Python types vs NumPy dtypes (but booleans are the >> > > > same for >> > > > both): >> > > > https://jax.readthedocs.io/en/latest/type_promotion.html >> > > >> > > >> > _______________________________________________ >> > NumPy-Discussion mailing list >> > NumPy-Discussion@python.org >> > https://mail.python.org/mailman/listinfo/numpy-discussion >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion@python.org >> https://mail.python.org/mailman/listinfo/numpy-discussion >> > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion@python.org > https://mail.python.org/mailman/listinfo/numpy-discussion >
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion