A pain point I ran into a while ago was assuming that an np.ndarray with 
dtype=np.bool_ would act similarly to the Python built-in boolean under 
addition. This is not the case, as shown in the following code snippet:

>>> np.bool_(True) + True
True
>>> True + True
2

In fact, I'm somewhat confused about all the arithmetic operations on boolean 
arrays:

>>> np.bool_(True) * True
True
>>> np.bool_(True) / True
1.0
>>> np.bool_(True) - True
TypeError: numpy boolean subtract, the `-` operator, is not supported, use the 
bitwise_xor, the `^` operator, or the logical_xor function instead.
>>> for x, y in ((False, False), (False, True), (True, False), (True, True)): 
>>> print(np.bool_(x) ** y, end=" ")
1 0 1 1

I get that addition corresponds to "logical or" and multiplication corresponds 
to "logical and", but I'm lost on the division and exponentiation operations 
given that addition and multiplication don't promote the dtype to integers or 
floats.

If arrays stubbornly refused to ever change type or interact with objects of a 
different type under addition, that'd be one thing, but they do change:

>>> np.uint8(0) - 1
-1
>>> (np.uint8(0) - 1).dtype
dtype('int64')
>>> (np.uint8(0) + 0.1).dtype
dtype('float64')

This dtype change can also be seen in the division and exponentiation above for 
np.bool_.

Why the discrepancy in behavior for np.bool_? And why are arithmetic operations 
for np.bool_ inconsistently promoted to other data types?

If all arithmetic operations on np.bool_ resulted in integers, that would be 
consistent (so easier to work with) and wouldn't restrict expressiveness 
because there are also "logical or" (|) and "logical and" (&) operations 
available. Alternatively, division and exponentiation could throw errors like 
subtract, but the discrepancy between np.bool_ and the Python built-in bool for 
addition and multiplication would remain.

For context, I ran into an issue with this discrepancy in behavior while 
working on an image segmentation problem. For binary segmentation problems, we 
make use of boolean arrays to represent where an object is (the locations in 
the array which are "True" correspond to the foreground/object-of-interest, 
"False" corresponds to the background). I was aggregating multiple binary 
segmentation arrays to do a majority vote with an implementation that boiled 
down to the following:

>>> pred1, pred2, ..., predN = np.array(..., dtype=np.bool_), np.array(..., 
>>> dtype=np.bool_), ..., np.array(..., dtype=np.bool_)
>>> aggregate = (pred1 + pred2 + ... + predN) / N
>>> agg_pred = aggregate >= 0.5

Which returned (1.0 / N) in all indices which had at least one "True" value in 
a prediction. I assumed that the arrays would be promoted to integers (False -> 
0; True -> 1) and added so that agg_pred would hold the majority vote result. 
But agg_pred was always empty because the maximum value was (1.0 / N) for N > 2.

My current "work around" is to remind myself of this discrepancy by importing 
"builtins" from the standard library and annotating the relevant functions and 
variables as using the "builtins.bool" to explicitly distinguish it from 
np.bool_ behavior where applicable, and add checks and/or conversions on top of 
that. But why not make np.bool_ act like the built-in bool under addition and 
multiplication  and let users use the already existing | and & operations for 
"logical or" and "logical and"?
_______________________________________________
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com

Reply via email to