On Thu, Nov 13, 2014 at 8:10 AM, Sebastian <se...@sebix.at> wrote: > On 2014-11-04 19:44, Charles R Harris wrote: > > On Tue, Nov 4, 2014 at 11:19 AM, Sebastian <se...@sebix.at> wrote: > > > >> On 2014-11-04 15:06, Todd wrote: > >>> On Tue, Nov 4, 2014 at 2:50 PM, Sebastian Wagner <se...@sebix.at > >> > >>> <mailto:se...@sebix.at>> wrote: > >>> > >>> Hello, > >>> > >>> I want to bring up Issue #2522 'numpy.diff fails on unsigned > >> integers > >>> (Trac #1929)' [1], as it was resonsible for an error in one > >> of our > >>> programs. Short explanation of the bug: np.diff performs a > >> subtraction > >>> on the input array. If this is of type uint and the data > >> contains > >>> falling data, it results in an artihmetic underflow. > >>> > >>> >>> np.diff(np.array([0,1,0], dtype=np.uint8)) > >>> array([ 1, 255], dtype=uint8) > >>> > >>> @charris proposed either > >>> - a note to the doc string and maybe an example to clarify > >> things > >>> - or raise a warning > >>> but with a discussion on the list. > >>> > >>> I would like to start it now, as it is an error which is not > >> easily > >>> detectable (no errors or warnings are thrown). In our case > >> the > >>> type of a > >>> data sequence, with only zeros and ones, had type f8 as also > >> every > >>> other > >>> one, has been changed to u4. As the programs looked for > >> values ==1 and > >>> ==-1, it broke silently. > >>> In my opinion, a note in the docs is not enough and does not > >> help > >>> if the > >>> type changed or set after the program has been written. > >>> I'd go for automatic upcasting of uints by default and an > >> option > >>> to turn > >>> it off, if this behavior is explicitly wanted. This wouldn't > >> be > >>> correct > >>> from the point of view of a programmer, but as most of the > >> users > >>> have a > >>> scientific background who excpect it 'to work', instead of > >> sth is > >>> theoretically correct but not convenient. (I count myself to > >> the first > >>> group) > >>> > >>> > >>> > >>> When you say "automatic upcasting", that would be, for example > >> uint8 > >>> to int16? What about for uint64? There is no int128. > >> The upcast should go to the next bigger, otherwise it would again > >> result > >> in wrong values. uint64 we can't do that, so it has to stay. > >>> Also, when you say "by default", is this only when an overflow is > >>> detected, or always? > >> I don't know how I could detect an overflow in the diff-function. > >> In > >> subtraction it should be possible, but that's very deep in the > >> numpy-internals. > >>> How would the option to turn it off be implemented? An argument > >> to > >>> np.diff or some sort of global option? > >> I thought of a parameter upcast_int=True for the function. > > > > Could check for non-decreasing sequence in the unsigned case. Note > > that differences of signed integers can also overflow. One way to > > check in general is to determine the expected sign using comparisons. > > I think you mean a decreasing/non-increasing instead of non-decreasing > sequence? > It's also the same check as checking for a sorted sequence. But I > currently don't know how I could do that efficiently without np.diff in > Python, in Cython it should be easily possible. > > > np.gradient has the same problem: > >>> np.random.seed(89) > >>> d = np.random.randint(0,2,size=10).astype(np.uint8); d > array([1, 0, 0, 1, 0, 1, 1, 0, 0, 0], dtype=uint8) > >>> np.diff(d) > array([255, 0, 1, 255, 1, 0, 255, 0, 0], dtype=uint8) > >>> np.gradient(d) > array([ 255. , 127.5, 0.5, 0. , 0. , 0.5, 127.5, 127.5, > 0. , 0. ]) > >
Consider it is generally an error, might it be good to have a general warning built into the int dtypes regarding overflow errors? That warning can then be caught by the diff function.
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion