> By default an error should probably be raised I would very strongly recommend keeping the behavior consistent with that of analytic DBMSes. I don't think that most analytic DBMS error on overflows because it's too computationally expensive to check. NumPy doesn't error (by default at least) when you have overflows either:
>>> arr = np.array([128, 128, 128], dtype='i1') >>> arr + arr array([0, 0, 0], dtype=int8) On Wed, Jun 3, 2020 at 10:29 AM Antoine Pitrou <solip...@pitrou.net> wrote: > > On Wed, 3 Jun 2020 10:47:38 -0400 > Ben Kietzman <ben.kietz...@rstudio.com> wrote: > > https://github.com/apache/arrow/pull/7341#issuecomment-638241193 > > > > How should arithmetic kernels handle integer overflow? > > > > The approach currently taken in the linked PR is to promote such that > > overflow will not occur, for example `(int8, int8)->int16` and `(uint16, > > uint16)->uint32`. > > > > I'm not sure that's desirable. For one thing this leads to inconsistent > > handling of 64 bit integer types, which are currently allowed to overflow > > since we cannot promote further (NB: that means this kernel includes > > undefined behavior for int64). > > I agree with you. I would strongly advise against implicit promotion > accross arithmetic operations. We initially did that in Numba and it > quickly became a can of worms. > > The most desirable behaviour IMHO is to keep the original type, so: > - (int8, int8) -> int8 > - (uint16, uint16) -> uint16 > > Then the question is what happens when the actual overflow occurs. I > think this should be directed by a kernel option. By default an error > should probably be raised (letting errors pass and silently produce > erroneous data is wrong), but we might want to allow people to bypass > overflow checks for speed. > > Note that even if overflow detection is enabled, it *should* be possible > to enable vectorization, e.g. by making overflow detection a separate > pass (itself vectorizable). > > Regards > > Antoine. > >