On Wed, Jun 3, 2020 at 11:25 AM Krisztián Szűcs
<szucs.kriszt...@gmail.com> wrote:
>
> On Wed, Jun 3, 2020 at 6:16 PM Krisztián Szűcs
> <szucs.kriszt...@gmail.com> wrote:
> >
> > On Wed, Jun 3, 2020 at 5:52 PM Wes McKinney <wesmck...@gmail.com> wrote:
> > >
> > > On Wed, Jun 3, 2020 at 10:49 AM Krisztián Szűcs
> > > <szucs.kriszt...@gmail.com> wrote:
> > > >
> > > > From the user perspective I find the following pretty confusing:
> > > >
> > > > In [1]: np.array([-128, 127], dtype=np.int8()) * 2
> > > > Out[1]: array([ 0, -2], dtype=int8)
> > > >
> > > > In [2]: np.array([-128, 127], dtype=np.int16()) * 2
> > > > Out[2]: array([-256,  254], dtype=int16)
> > > >
> > > > In my opinion somewhere (on a higher level maybe) we should provide
> > > > the correct results promoted to a wider type implicitly.
> > >
> > > Yes, I agree with you, but I agree that the best place to address this
> > > is at a higher level rather than having this logic implemented at the
> > > lowest level (kernels) -- I think database systems handle this during
> > > logical->physical planning.
> >
> > It raises another question: where to incorporate the implicit promotions?
> > // correct me if I'm wrong but these implicit promotions are operation
> > // dependent and distinct from kernel dispatching issue [1]
> >
> > The numpy example above can be roughly translated to:
> > >>> a = pa.array([-128, 127])
> > >>> pa.compute.add(a, a)
> > array([ 0, -2]
> >
> > Which is rather surprising from the user's perspective.
>
> Would it be enough to document the exact behavior  and advice the user
> to place casts until we have logical -> phisycal machinery?

I think it's enough to clearly document the behavior and assume that
the "user" will act according to what semantics are desired for their
use cases. Per my comments in my last e-mail I don't think the users
of these functions need to be handled with "kid's gloves".

> I'm updating my PR as discussed.
> >
> > [1] https://issues.apache.org/jira/browse/ARROW-8919
> > >
> > > > Clickhouse for example does the type promotion.
> > > >
> > > > On Wed, Jun 3, 2020 at 5:29 PM Antoine Pitrou <solip...@pitrou.net> 
> > > > wrote:
> > > > >
> > > > > On Wed, 3 Jun 2020 10:47:38 -0400
> > > > > Ben Kietzman <ben.kietz...@rstudio.com> wrote:
> > > > > > https://github.com/apache/arrow/pull/7341#issuecomment-638241193
> > > > > >
> > > > > > How should arithmetic kernels handle integer overflow?
> > > > > >
> > > > > > The approach currently taken in the linked PR is to promote such 
> > > > > > that
> > > > > > overflow will not occur, for example `(int8, int8)->int16` and 
> > > > > > `(uint16,
> > > > > > uint16)->uint32`.
> > > > > >
> > > > > > I'm not sure that's desirable. For one thing this leads to 
> > > > > > inconsistent
> > > > > > handling of 64 bit integer types, which are currently allowed to 
> > > > > > overflow
> > > > > > since we cannot promote further (NB: that means this kernel includes
> > > > > > undefined behavior for int64).
> > > > >
> > > > > I agree with you.  I would strongly advise against implicit promotion
> > > > > accross arithmetic operations.  We initially did that in Numba and it
> > > > > quickly became a can of worms.
> > > > >
> > > > > The most desirable behaviour IMHO is to keep the original type, so:
> > > > > - (int8, int8) -> int8
> > > > > - (uint16, uint16) -> uint16
> > > > >
> > > > > Then the question is what happens when the actual overflow occurs.  I
> > > > > think this should be directed by a kernel option.  By default an error
> > > > > should probably be raised (letting errors pass and silently produce
> > > > > erroneous data is wrong), but we might want to allow people to bypass
> > > > > overflow checks for speed.
> > > > >
> > > > > Note that even if overflow detection is enabled, it *should* be 
> > > > > possible
> > > > > to enable vectorization, e.g. by making overflow detection a separate
> > > > > pass (itself vectorizable).
> > > > >
> > > > > Regards
> > > > >
> > > > > Antoine.
> > > > >
> > > > >

Reply via email to