On 29 Nov 2012 16:27, "Jim Kitchen" <[email protected]> wrote: > > I understand the historical reason for the "Truth value of an array" error, > avoiding the pitfalls of: > >>> a = np.arange(10) > >>> if a==0: # this is ambiguous, use any() or all() > > I also understand the issues with logical and/or: > >>> if (a<10 and a > 5): # this will break due to "and" vs "&" > > However the main point in this thread from 3 years ago is very valid. If I > write code that uses lists and then convert that to an array for efficiency > or more powerful computation, I have my own pitfall trying to do: > >>> if a: # why doesn't this just check for size? > > My Proposal > ------------------ > It seems to me that an elegant solution to this dilemma is to separate the > behavior of ndarrays of type bool from all other ndarrays. Keep the current > behavior for ndarrays of type bool, but let the __nonzero__ for all other > ndarrays be based on size. > > >>> if a==0: # still raises Truth error because it's of dtype bool > > >>> if (a<10 and a>5): # still raises Truth error because it's of dtype bool > > >>> if a: # works fine because dtype is int64
I see what you mean, but I think this change would be dangerously confusing. The problem is that an ndarray of ints follows the conventions of both Python lists and Python ints. E.g., it acts like a list for len() and iteration, but like an int for arithmetic (+ does addition, not concatenation). So your suggestion makes sense if you're thinking of the list analogy first, but there are other people out there who are going to think of the int analogy first instead. Python has two very well established conventions for how boolean casting works (emptiness for containers, zero-ness for scalars), and they conflict here. The current behaviour follows the scalar convention when possible, and almost always throws a safe error if people wrote code expecting the container. Your version would be just as confusing to a different set of people, but wouldn't even fail safe by raising an error, it would just silently do the wrong thing. > This solution avoids all the primary pitfalls of ambiguity where someone > would need any() or all() because they're working with bools at the element > level. But for cases where a function may return data or None, I really like > to use the normal Python truth test for that instead of: > >>> if a is not None and len(a) > 0: # that was a chore to find out > > The only problem I see with this solution is with the case of the > single-element array. > >>> s = np.array([[0]]) > >>> if s: # works today, returns False > > With my proposal, > >>> if s: # still works, but returns True because array is not empty This would break huge amounts of already existing code. So that means that to get this change through, you'd have to not just convince everyone that it was a good idea, but that bit was such an improvement that it'd be worth auditing all that code (and it's not even greppable). It strikes me as unusual, though, that you're testing for both None and emptiness and treating them the same in your if statement. If your function is returning an empty array as a 'special' value to signal that something funny has happened, then perhaps it could just return None in this case instead? If it's returning an empty array as an ordinary value (e.g. when you happen to have zero data points that fall into some category), then usually you don't need to check for this explicitly, since numpy functions like sum() etc. will do the right thing? Of course you might have some situation where everything happens to line up so that what you wrote is the best solution, but you might want to revisit it to check. Or post a longer example here and see if anyone has suggestions for how to make it more "numpythonic". Hope that helps, -n > It's a wart to be sure, but it would make ndarrays much easier to work with > when converting from standard Python containers. Maybe we need something > like this (probably not possible): > >>> from numpy.__future__ import truthiness > > I've especially found this Truth error a challenge converting from > dictionaries of lists to pandas DataFrames. It raises the same error, > tracing back to this ambiguity in ndarrays. If it's too big of a change to > make for ndarrays, maybe the same proposal could be implemented in pandas. > > Jim > _______________________________________________ > NumPy-Discussion mailing list > [email protected] > http://mail.scipy.org/mailman/listinfo/numpy-discussion > _______________________________________________ NumPy-Discussion mailing list [email protected] http://mail.scipy.org/mailman/listinfo/numpy-discussion
