On Mon, 2020-07-06 at 12:39 -0600, Aaron Meurer wrote: > I've been trying to figure out this behavior. It doesn't seem to be > documented at > https://numpy.org/doc/stable/reference/arrays.indexing.html > > > > > a = np.empty((2, 3)) > > > > a.shape > (2, 5) > > > > a[True].shape > (1, 2, 5) > > > > a[False].shape > (0, 2, 5) > > It seems like indexing with a raw boolean (True or False) adds an > axis > with a dimension 1 or 0, resp. > > Except it only works once: > > > > > a[:,False] > array([], shape=(2, 0, 3), dtype=float64) > > > > a[:,False, False] > array([], shape=(2, 0, 3), dtype=float64) > > > > a[:,False,True].shape > (2, 0, 3) > > > > a[:,True,False].shape > (2, 0, 3) > > The docs say "A single boolean index array is practically identical > to > x[obj.nonzero()]". I have a hard time seeing this as an extension of > that, since indexing by `np.nonzero(False)` or `np.nonzero(True)` > *replaces* the given axis. > > >>> a[np.nonzero(True)].shape > (1, 3) > > > > a[np.nonzero(False)].shape > (0, 3) > > I think at best this behavior should be documented. I'm trying to > understand the motivation for it, or if it's even intentional. And in > particular, why do multiple boolean indices not insert multiple axes? > It would actually be useful to be able to generically add length 0 > axes using an index, similar to how `newaxis` adds a length 1 axis.
Its fully intentional as it is the correct generalization from an N-D boolean index to include a 0-D boolean index. To be fair, there is a footnote in the "Detailed notes" saying that: "the nonzero equivalence for Boolean arrays does not hold for zero dimensional boolean arrays.", this is for technical reasons since `nonzero` does not do useful things for 0-D input. In any case, a boolean index always does the following: 1. It will *remove as many dimensions as the index has, because this is the number of dimensions effectively indexed by it* 2. It will add a single new dimension at the same place. The length of this new dimension is the number of `True` elements. 3. If you have multiple advanced indexing you get annoying broadcasting of all of these. That is *always* confusing for boolean indices. 0-D should not be too special there... And this generalizes to 0-D just as well, even if it may be a bit surprising at first. I have written much of this more clearly once before in this NEP, which may be a good read to _really_ understand it: https://numpy.org/neps/nep-0021-advanced-indexing.html In general, I wonder if going into much depth about how 0-D arrays are not actually really handled very special is good. Yes, its confusing on its own, but it seems also a bit like overloading the user with unnecessary knowledge? Cheers, Sebastian > > Aaron Meurer > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion@python.org > https://mail.python.org/mailman/listinfo/numpy-discussion >
signature.asc
Description: This is a digitally signed message part
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion