Re: [Numpy-discussion] Truth value of ndarray not Pythonic

Nathaniel Smith Thu, 29 Nov 2012 11:19:13 -0800

On 29 Nov 2012 16:27, "Jim Kitchen" <[email protected]> wrote:
>
> I understand the historical reason for the "Truth value of an array" error, 
> avoiding the pitfalls of:
> >>> a = np.arange(10)
> >>> if a==0: # this is ambiguous, use any() or all()
>
> I also understand the issues with logical and/or:
> >>> if (a<10 and a > 5): # this will break due to "and" vs "&"
>
> However the main point in this thread from 3 years ago is very valid.  If I 
> write code that uses lists and then convert that to an array for efficiency 
> or more powerful computation, I have my own pitfall trying to do:
> >>> if a: # why doesn't this just check for size?
>
> My Proposal
> ------------------
> It seems to me that an elegant solution to this dilemma is to separate the 
> behavior of ndarrays of type bool from all other ndarrays.  Keep the current 
> behavior for ndarrays of type bool, but let the __nonzero__ for all other 
> ndarrays be based on size.
>
> >>> if a==0: # still raises Truth error because it's of dtype bool
>
> >>> if (a<10 and a>5): # still raises Truth error because it's of dtype bool
>
> >>> if a: # works fine because dtype is int64


I see what you mean, but I think this change would be dangerously
confusing. The problem is that an ndarray of ints follows the
conventions of both Python lists and Python ints. E.g., it acts like a
list for len() and iteration, but like an int for arithmetic (+ does
addition, not concatenation). So your suggestion makes sense if you're
thinking of the list analogy first, but there are other people out
there who are going to think of the int analogy first instead. Python
has two very well established conventions for how boolean casting
works (emptiness for containers, zero-ness for scalars), and they
conflict here. The current behaviour follows the scalar convention
when possible, and almost always throws a safe error if people wrote
code expecting the container. Your version would be just as confusing
to a different set of people, but wouldn't even fail safe by raising
an error, it would just silently do the wrong thing.

> This solution avoids all the primary pitfalls of ambiguity where someone 
> would need any() or all() because they're working with bools at the element 
> level.  But for cases where a function may return data or None, I really like 
> to use the normal Python truth test for that instead of:
> >>> if a is not None and len(a) > 0: # that was a chore to find out
>
> The only problem I see with this solution is with the case of the 
> single-element array.
> >>> s = np.array([[0]])
> >>> if s: # works today, returns False
>
> With my proposal,
> >>> if s: # still works, but returns True because array is not empty

This would break huge amounts of already existing code. So that means
that to get this change through, you'd have to not just convince
everyone that it was a good idea, but that bit was such an improvement
that it'd be worth auditing all that code (and it's not even
greppable).

It strikes me as unusual, though, that you're testing for both None
and emptiness and treating them the same in your if statement. If your
function is returning an empty array as a 'special' value to signal
that something funny has happened, then perhaps it could just return
None in this case instead? If it's returning an empty array as an
ordinary value (e.g. when you happen to have zero data points that
fall into some category), then usually you don't need to check for
this explicitly, since numpy functions like sum() etc. will do the
right thing? Of course you might have some situation where everything
happens to line up so that what you wrote is the best solution, but
you might want to revisit it to check. Or post a longer example here
and see if anyone has suggestions for how to make it more
"numpythonic".

Hope that helps,
-n

> It's a wart to be sure, but it would make ndarrays much easier to work with 
> when converting from standard Python containers.  Maybe we need something 
> like this (probably not possible):
> >>> from numpy.__future__ import truthiness
>
> I've especially found this Truth error a challenge converting from 
> dictionaries of lists to pandas DataFrames.  It raises the same error, 
> tracing back to this ambiguity in ndarrays.  If it's too big of a change to 
> make for ndarrays, maybe the same proposal could be implemented in pandas.
>
> Jim
> _______________________________________________
> NumPy-Discussion mailing list
> [email protected]
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
_______________________________________________
NumPy-Discussion mailing list
[email protected]
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Truth value of ndarray not Pythonic

Reply via email to