[Numpy-discussion] Re: np.bool_ vs Python bool behavior

Sebastian Berg Sun, 13 Mar 2022 15:03:45 -0700

Hi Jacob,

adding to what Chuck mentioned, a few inline comments if you are
interested in some gory details.

On Sat, 2022-03-12 at 21:40 +0000, Jacob Reinhold wrote:
> A pain point I ran into a while ago was assuming that an np.ndarray
> with dtype=np.bool_ would act similarly to the Python built-in
> boolean under addition. This is not the case, as shown in the
> following code snippet:
> 
> > > > np.bool_(True) + True
> True
> > > > True + True
> 2
> 
> In fact, I'm somewhat confused about all the arithmetic operations on
> boolean arrays:
> 
> > > > np.bool_(True) * True
> True
> > > > np.bool_(True) / True
> 1.0
> > > > np.bool_(True) - True
> TypeError: numpy boolean subtract, the `-` operator, is not
> supported, use the bitwise_xor, the `^` operator, or the logical_xor
> function instead.
> > > > for x, y in ((False, False), (False, True), (True, False),
> > > > (True, True)): print(np.bool_(x) ** y, end=" ")
> 1 0 1 1
> 
> I get that addition corresponds to "logical or" and multiplication
> corresponds to "logical and", but I'm lost on the division and
> exponentiation operations given that addition and multiplication
> don't promote the dtype to integers or floats.

I doubt this is historically intentional – or at least choices made
fairly pragmatically 10-20 years ago.

But gaining the momentum to change is hard.  Although, we did disable
`bool - bool`, because it was particularly ill defined.

If you are interested in the guts of it, there are three types of
behaviors:

1. Functions that are explicitly defined for bool.  E.g. `add` and
   `multiply` are examples.  (Check `np.add.types` for ufuncs.)

2. Functions which probably never had a conscious decision made, but
   do not have a bool implementation:
   These will usually end up using `int8`  (e.g. `floor_divide`)

3. A few functions are more explicit. Subtraction refuses booleans,
   division uses float64 (although int8/int8 -> float64 so that is
   not very special).

The reason is that if there is no boolean implementation, by default
the "next" implementation (e.g. the `int8` one) will be used.  Leading
to behavior 2.
To get an error (e.g. for subtract) we have to refuse it explicitly and
that is a bit complex (3).  That is both complicated and easy to
forget. 

N.B.:  I have changed that logic. "Future" ufuncs are now reversed.
They will default to an error rather than using the `int8`
implementation.
That should make change easier, but doesn't really solve the problem at
hand...

> If arrays stubbornly refused to ever change type or interact with
> objects of a different type under addition, that'd be one thing, but
> they do change:
> 
> > > > np.uint8(0) - 1
> -1
> > > > (np.uint8(0) - 1).dtype
> dtype('int64')
> > > > (np.uint8(0) + 0.1).dtype
> dtype('float64')
> 
> This dtype change can also be seen in the division and exponentiation
> above for np.bool_.

This is has a subtly different reason:  It is due to "value-based
promotion" and how it works.

How NumPy interprets the `1` depends a on the context!  We use a "weak"
(but value-inspecting) logic if other is an _array_:

    np.array([0, 1, 2], dtype=np.uint8) - 1
    # array([255,   0,   1], dtype=uint8)

Where the value inspecting part kicks in for:

    np.array([0, 1, 2], dtype=np.uint8) + 300
    # Will go to uint16

But, when the other object is a NumPy scalar or a 0-D array, we do not
use that logic currently.  We instead do:

       np.array(0, dtype=np.uint8) - 1
    => np.array(0, dtype=np.uint8) - np.asarray(1)
    => np.array(0, dtype=np.uint8) - np.array(1, dtype=np.int64)

And that gives you the default integer (usually int64)!

We are considering changing it, but it is a big change I am actively
working on:
https://github.com/numpy/numpy/pull/21103
https://discuss.scientific-python.org/t/poll-future-numpy-behavior-when-mixing-arrays-numpy-scalars-and-python-scalars/202

> 
> Why the discrepancy in behavior for np.bool_? And why are arithmetic
> operations for np.bool_ inconsistently promoted to other data types?
> 
> If all arithmetic operations on np.bool_ resulted in integers, that
> would be consistent (so easier to work with) and wouldn't restrict
> expressiveness because there are also "logical or" (|) and "logical
> and" (&) operations available. Alternatively, division and
> exponentiation could throw errors like subtract, but the discrepancy
> between np.bool_ and the Python built-in bool for addition and
> multiplication would remain.

I am not sure anyone ever seriously tried to change this.

In general, we would have to take this pretty slow probably, similar to
what Chuck said about subtraction:
1. Make it an error (subtraction is there
2. Switch (potentially with a warning first) to making it an integer

Or we just stay with errors of course.

In general, I like the idea of doing something about this, so we should
discuss this!  But, I do suspect in the end we would have to formalize
a proposal.  And some users are bound to be disappointed to see the
current logic gone.

Cheers,

Sebastian

> 
> For context, I ran into an issue with this discrepancy in behavior
> while working on an image segmentation problem. For binary
> segmentation problems, we make use of boolean arrays to represent
> where an object is (the locations in the array which are "True"
> correspond to the foreground/object-of-interest, "False" corresponds
> to the background). I was aggregating multiple binary segmentation
> arrays to do a majority vote with an implementation that boiled down
> to the following:
> 
> > > > pred1, pred2, ..., predN = np.array(..., dtype=np.bool_),
> > > > np.array(..., dtype=np.bool_), ..., np.array(...,
> > > > dtype=np.bool_)
> > > > aggregate = (pred1 + pred2 + ... + predN) / N
> > > > agg_pred = aggregate >= 0.5
> 
> Which returned (1.0 / N) in all indices which had at least one "True"
> value in a prediction. I assumed that the arrays would be promoted to
> integers (False -> 0; True -> 1) and added so that agg_pred would
> hold the majority vote result. But agg_pred was always empty because
> the maximum value was (1.0 / N) for N > 2.
> 
> My current "work around" is to remind myself of this discrepancy by
> importing "builtins" from the standard library and annotating the
> relevant functions and variables as using the "builtins.bool" to
> explicitly distinguish it from np.bool_ behavior where applicable,
> and add checks and/or conversions on top of that. But why not make
> np.bool_ act like the built-in bool under addition and
> multiplication  and let users use the already existing | and &
> operations for "logical or" and "logical and"?
> _______________________________________________
> NumPy-Discussion mailing list -- numpy-discussion@python.org
> To unsubscribe send an email to numpy-discussion-le...@python.org
> https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
> Member address: sebast...@sipsolutions.net
>

signature.asc
Description: This is a digitally signed message part

_______________________________________________
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com

[Numpy-discussion] Re: np.bool_ vs Python bool behavior

Reply via email to