A pain point I ran into a while ago was assuming that an np.ndarray with dtype=np.bool_ would act similarly to the Python built-in boolean under addition. This is not the case, as shown in the following code snippet:
>>> np.bool_(True) + True True >>> True + True 2 In fact, I'm somewhat confused about all the arithmetic operations on boolean arrays: >>> np.bool_(True) * True True >>> np.bool_(True) / True 1.0 >>> np.bool_(True) - True TypeError: numpy boolean subtract, the `-` operator, is not supported, use the bitwise_xor, the `^` operator, or the logical_xor function instead. >>> for x, y in ((False, False), (False, True), (True, False), (True, True)): >>> print(np.bool_(x) ** y, end=" ") 1 0 1 1 I get that addition corresponds to "logical or" and multiplication corresponds to "logical and", but I'm lost on the division and exponentiation operations given that addition and multiplication don't promote the dtype to integers or floats. If arrays stubbornly refused to ever change type or interact with objects of a different type under addition, that'd be one thing, but they do change: >>> np.uint8(0) - 1 -1 >>> (np.uint8(0) - 1).dtype dtype('int64') >>> (np.uint8(0) + 0.1).dtype dtype('float64') This dtype change can also be seen in the division and exponentiation above for np.bool_. Why the discrepancy in behavior for np.bool_? And why are arithmetic operations for np.bool_ inconsistently promoted to other data types? If all arithmetic operations on np.bool_ resulted in integers, that would be consistent (so easier to work with) and wouldn't restrict expressiveness because there are also "logical or" (|) and "logical and" (&) operations available. Alternatively, division and exponentiation could throw errors like subtract, but the discrepancy between np.bool_ and the Python built-in bool for addition and multiplication would remain. For context, I ran into an issue with this discrepancy in behavior while working on an image segmentation problem. For binary segmentation problems, we make use of boolean arrays to represent where an object is (the locations in the array which are "True" correspond to the foreground/object-of-interest, "False" corresponds to the background). I was aggregating multiple binary segmentation arrays to do a majority vote with an implementation that boiled down to the following: >>> pred1, pred2, ..., predN = np.array(..., dtype=np.bool_), np.array(..., >>> dtype=np.bool_), ..., np.array(..., dtype=np.bool_) >>> aggregate = (pred1 + pred2 + ... + predN) / N >>> agg_pred = aggregate >= 0.5 Which returned (1.0 / N) in all indices which had at least one "True" value in a prediction. I assumed that the arrays would be promoted to integers (False -> 0; True -> 1) and added so that agg_pred would hold the majority vote result. But agg_pred was always empty because the maximum value was (1.0 / N) for N > 2. My current "work around" is to remind myself of this discrepancy by importing "builtins" from the standard library and annotating the relevant functions and variables as using the "builtins.bool" to explicitly distinguish it from np.bool_ behavior where applicable, and add checks and/or conversions on top of that. But why not make np.bool_ act like the built-in bool under addition and multiplication and let users use the already existing | and & operations for "logical or" and "logical and"? _______________________________________________ NumPy-Discussion mailing list -- numpy-discussion@python.org To unsubscribe send an email to numpy-discussion-le...@python.org https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ Member address: arch...@mail-archive.com