On Tue, Sep 18, 2012 at 4:42 PM, Charles R Harris <charlesr.har...@gmail.com > wrote:
> > > On Tue, Sep 18, 2012 at 2:33 PM, Travis Oliphant <tra...@continuum.io>wrote: > >> >> On Sep 18, 2012, at 2:44 PM, Charles R Harris wrote: >> >> >> >> On Tue, Sep 18, 2012 at 1:35 PM, Benjamin Root <ben.r...@ou.edu> wrote: >> >>> >>> >>> On Tue, Sep 18, 2012 at 3:25 PM, Charles R Harris < >>> charlesr.har...@gmail.com> wrote: >>> >>>> >>>> >>>> On Tue, Sep 18, 2012 at 1:13 PM, Benjamin Root <ben.r...@ou.edu> wrote: >>>> >>>>> >>>>> >>>>> On Tue, Sep 18, 2012 at 2:47 PM, Charles R Harris < >>>>> charlesr.har...@gmail.com> wrote: >>>>> >>>>>> >>>>>> >>>>>> On Tue, Sep 18, 2012 at 11:39 AM, Benjamin Root <ben.r...@ou.edu>wrote: >>>>>> >>>>>>> >>>>>>> >>>>>>> On Mon, Sep 17, 2012 at 9:33 PM, Charles R Harris < >>>>>>> charlesr.har...@gmail.com> wrote: >>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On Mon, Sep 17, 2012 at 3:40 PM, Travis Oliphant < >>>>>>>> tra...@continuum.io> wrote: >>>>>>>> >>>>>>>>> >>>>>>>>> On Sep 17, 2012, at 8:42 AM, Benjamin Root wrote: >>>>>>>>> >>>>>>>>> > Consider the following code: >>>>>>>>> > >>>>>>>>> > import numpy as np >>>>>>>>> > a = np.array([1, 2, 3, 4, 5], dtype=np.int16) >>>>>>>>> > a *= float(255) / 15 >>>>>>>>> > >>>>>>>>> > In v1.6.x, this yields: >>>>>>>>> > array([17, 34, 51, 68, 85], dtype=int16) >>>>>>>>> > >>>>>>>>> > But in master, this throws an exception about failing to cast >>>>>>>>> via same_kind. >>>>>>>>> > >>>>>>>>> > Note that numpy was smart about this operation before, consider: >>>>>>>>> > a = np.array([1, 2, 3, 4, 5], dtype=np.int16) >>>>>>>>> > a *= float(128) / 256 >>>>>>>>> >>>>>>>>> > yields: >>>>>>>>> > array([0, 1, 1, 2, 2], dtype=int16) >>>>>>>>> > >>>>>>>>> > Of course, this is different than if one does it in a >>>>>>>>> non-in-place manner: >>>>>>>>> > np.array([1, 2, 3, 4, 5], dtype=np.int16) * 0.5 >>>>>>>>> > >>>>>>>>> > which yields an array with floating point dtype in both >>>>>>>>> versions. I can appreciate the arguments for preventing this kind of >>>>>>>>> implicit casting between non-same_kind dtypes, but I argue that >>>>>>>>> because the >>>>>>>>> operation is in-place, then I (as the programmer) am explicitly >>>>>>>>> stating >>>>>>>>> that I desire to utilize the current array to store the results of the >>>>>>>>> operation, dtype and all. Obviously, we can't completely turn off >>>>>>>>> this >>>>>>>>> rule (for example, an in-place addition between integer array and a >>>>>>>>> datetime64 makes no sense), but surely there is some sort of happy >>>>>>>>> medium >>>>>>>>> that would allow these sort of operations to take place? >>>>>>>>> > >>>>>>>>> > Lastly, if it is determined that it is desirable to allow >>>>>>>>> in-place operations to continue working like they have before, I >>>>>>>>> would like >>>>>>>>> to see such a fix in v1.7 because if it isn't in 1.7, then other >>>>>>>>> libraries >>>>>>>>> (such as matplotlib, where this issue was first found) would have to >>>>>>>>> change >>>>>>>>> their code anyway just to be compatible with numpy. >>>>>>>>> >>>>>>>>> I agree that in-place operations should allow different casting >>>>>>>>> rules. There are different opinions on this, of course, but >>>>>>>>> generally this >>>>>>>>> is how NumPy has worked in the past. >>>>>>>>> >>>>>>>>> We did decide to change the default casting rule to "same_kind" >>>>>>>>> but making an exception for in-place seems reasonable. >>>>>>>>> >>>>>>>> >>>>>>>> I think that in these cases same_kind will flag what are most >>>>>>>> likely programming errors and sloppy code. It is easy to be explicit >>>>>>>> and >>>>>>>> doing so will make the code more readable because it will be >>>>>>>> immediately >>>>>>>> obvious what the multiplicand is without the need to recall what the >>>>>>>> numpy >>>>>>>> casting rules are in this exceptional case. IISTR several mentions of >>>>>>>> this >>>>>>>> before (Gael?), and in some of those cases it turned out that bugs were >>>>>>>> being turned up. Catching bugs with minimal effort is a good thing. >>>>>>>> >>>>>>>> Chuck >>>>>>>> >>>>>>>> >>>>>>> True, it is quite likely to be a programming error, but then again, >>>>>>> there are many cases where it isn't. Is the problem strictly that we >>>>>>> are >>>>>>> trying to downcast the float to an int, or is it that we are trying to >>>>>>> downcast to a lower precision? Is there a way for one to explicitly >>>>>>> relax >>>>>>> the same_kind restriction? >>>>>>> >>>>>> >>>>>> I think the problem is down casting across kinds, with the result >>>>>> that floats are truncated and the imaginary parts of imaginaries might be >>>>>> discarded. That is, the value, not just the precision, of the rhs >>>>>> changes. >>>>>> So I'd favor an explicit cast in code like this, i.e., cast the rhs to an >>>>>> integer. >>>>>> >>>>>> It is true that this forces downstream to code up to a higher >>>>>> standard, but I don't see that as a bad thing, especially if it exposes >>>>>> bugs. And it isn't difficult to fix. >>>>>> >>>>>> Chuck >>>>>> >>>>>> >>>>> Mind you, in my case, casting the rhs as an integer before doing the >>>>> multiplication would be a bug, since our value for the rhs is usually >>>>> between zero and one. Multiplying first by the integer numerator before >>>>> dividing by the integer denominator would likely cause issues with >>>>> overflowing the 16 bit integer. >>>>> >>>>> >>>> For the case in point I'd do >>>> >>>> In [1]: a = np.array([1, 2, 3, 4, 5], dtype=np.int16) >>>> >>>> In [2]: a //= 2 >>>> >>>> In [3]: a >>>> Out[3]: array([0, 1, 1, 2, 2], dtype=int16) >>>> >>>> Although I expect you would want something different in practice. But >>>> the current code already looks fragile to me and I think it is a good thing >>>> you are taking a closer look at it. If you really intend going through a >>>> float, then it should be something like >>>> >>>> a = (a*(float(128)/256)).astype(int16) >>>> >>>> Chuck >>>> >>>> >>> And thereby losing the memory benefit of an in-place multiplication? >>> >> >> What makes you think you are getting that? I'd have to check the numpy C >> source, but I expect the multiplication is handled just as I wrote it out. >> I don't recall any loops that handle mixed types likes that. I'd like to >> see some, though, scaling integers is a common problem. >> >> >> >> >>> That is sort of the point of all this. We are using 16 bit integers >>> because we wanted to be as efficient as possible and didn't need anything >>> larger. Note, that is what we changed the code to, I am just wondering if >>> we are being too cautious. The casting kwarg looks to be what I might >>> want, though it isn't as clean as just writing an "*=" statement. >>> >>> >> I think even there you will have an intermediate float array followed by >> a cast. >> >> >> This is true, but it is done in chunks of a fixed size (controllable by a >> thread-local variable or keyword argument to the ufunc). >> >> How difficult would it be to change in-place operations back to the >> "unsafe" default? >> > > Probably not too difficult, but I think it would be a mistake. What > keyword argument are you referring to? In the current case, I think what is > wanted is a scaling function that will actually do things in place. The > matplotlib folks would probably be happier with the result if they simply > coded up a couple of small Cython routines to do that. > > Chuck > > As far as matplotlib is concerned, the problem was solved when we reverted a change. The issue that I am raising is that it was such an innocuous, and frankly, obvious change to do an in-place operation in the first place. I have to wonder if we are being overly cautious with "same_kind". You are right, we probably would benefit greatly from creating some CXX scaling functions (contrary to popular belief, we don't use Cython), however, I would imagine that such general-purpose function would fare better within NumPy. But, ultimately, Python is about there being one right way of doing something, and so I think the goal should be to have a somewhat more restrictive casting rule than "unsafe" for in-place operations, but restrictive enough to catch the sort of errors "same_kind" was catching. This way, I have one way of doing an inplace operation, regardless of the types of my operands. Cheers, Ben Root
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion