Hi Olivier, > A key difference is that with arrays, the dtype is not chosen "just > big enough" for your data to fit. Either you set the dtype yourself, > or you're using the default inferred dtype (int/float). In both cases > you should know what to expect, and it doesn't depend on the actual > numeric values (except for the auto int/float distinction).
Yes, certainly; for example, you would get an int32/int64 if you simply do "array(4)". What I mean is, when you do "a+b" and b is a scalar, I had assumed that the normal array rules for addition apply, if you treat the dtype of b as being the smallest precision possible which can hold that value. E.g. 1 (int8) + 42 would treat 42 as an int8, and 1 (int8) + 200 would treat 200 as an int16. If I'm not mistaken, this is what happens currently. As far as knowing what to expect, well, as a library author I don't control what my users supply. I have to write conditional code to deal with things like this, and that's my interest in this issue. One way or another I have to handle it, correctly, and I'm trying to get a handle on what that means. > The ValueError is here to warn you that the operation may not be doing > what you want. The rollover for smaller values would be the documented > (and thus hopefully expected) behavior. Right, but what confuses me is that the only thing this prevents is the current upcast behavior. Why is that so evil it should be replaced with an exception? > Taking the addition as an example may be misleading, as it makes it > look like we could just "always rollover" to obtain consistent > behavior, and programmers are to some extent used to integer rollover > on this kind of operation. However, I gave examples with "maximum" > that I believe show it's not that easy (this behavior would just > appear "wrong"). Another example is with the integer division, where > casting the scalar silently would result in > array([-128], dtype=int8) // 128 -> [1] > which is unlikely to be something someone would like to obtain. But with the rule I outlined, this would be treated as: array([-128], dtype=int8) // array([128], dtype=int16) -> -1 (int16) > To summarize the goals of the proposal (in my mind): > 1. Low cognitive load (simple and consistent across ufuncs). > 2. Low risk of doing something unexpected. > 3. Efficient by default. > 4. Most existing (non buggy) code should not be affected. > > If we always do the silent cast, it will significantly break existing > code relying on the 1.6 behavior, and increases the risk of doing > something unexpected (bad on #2 & #4) > If we always upcast, we may break existing code and lose efficiency > (bad on #3 and #4). > If we keep current behavior, we stay with something that's difficult > to understand and has high risk of doing weird things (bad on #1 and > #2). I suppose what really concerns me here is, with respect to #2, addition raising ValueError is really unexpected (at least to me). I don't have control over the values my users pass to me, which means that I am going to have to carefully check for the presence of scalars and use either numpy.add or explicitly cast to a single-element array before performing addition (or, as you point out, any similar operation). >From a more basic perspective, I think that adding a number to an array should never raise an exception. I've not used any other language in which this behavior takes place. In C, you have rollover behavior, in IDL you roll over or clip, and in NumPy you either roll or upcast, depending on the version. IDL, etc. manage to handle things like max() or total() in a sensible (or at least defensible) fashion, and without raising an error. Andrew _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion