Re: sum of integer overflow

Li Yang Thu, 13 Aug 2015 23:23:29 -0700

https://issues.apache.org/jira/browse/CALCITE-845  created


On Thu, Aug 13, 2015 at 4:23 AM, Julian Hyde <[email protected]> wrote:

> OK, if you want this feature please log a JIRA case to track it.
>
> > On Aug 11, 2015, at 2:17 AM, Li Yang <[email protected]> wrote:
> >
> > A customizable policy is great since there's no standard.
> >
> > On Tue, Aug 11, 2015 at 5:54 AM, Jinfeng Ni <[email protected]>
> wrote:
> >
> >> I think adding a policy in Calcite to derive SUM's return type makes
> sense,
> >> as SQL standard seems not specify exactly what the return type of SUM
> >> should be. As a result, each system could choose to use different
> policy in
> >> the implementation.
> >>
> >> Oracle seems to use "the same data type as the numeric data type of the
> >> argument" [1], while DB2 uses the policy "The result is a large integer
> if
> >> the argument values are small integers" [2].
> >>
> >>
> >> [1] http://docs.oracle.com/database/121/SQLRF/functions196.htm#i89126
> >> [2]
> >>
> >>
> http://public.dhe.ibm.com/ps/products/db2/info/vr101/pdf/en_US/DB2SQLRefVol1-db2s1e1011.pdf
> >>
> >> On Mon, Aug 10, 2015 at 2:05 PM, Julian Hyde <[email protected]> wrote:
> >>
> >>> Postgres return type is "bigint for smallint or int arguments, numeric
> >> for
> >>> bigint arguments, otherwise the same as the argument data type”[1]
> >>>
> >>> SQL Server return type is int for tinyint, smallint or int; bigint for
> >>> bigint[2].
> >>>
> >>> I can see your point that “User demands the correct sum result”. But
> I’d
> >>> also be pissed with Postgres if it returned a numeric (arbitrary
> >> precision)
> >>> result when I am summing a bigint value. So I don’t think we’re going
> to
> >>> please everyone.
> >>>
> >>> I think the solution is to add the policy to derive SUM’s return type
> to
> >>> as a new method to RelDataTypeSystem. Then Kylin can supply its own.
> >>>
> >>> Julian
> >>>
> >>> [1] http://www.postgresql.org/docs/9.1/static/functions-aggregate.html
> <
> >>> http://www.postgresql.org/docs/9.1/static/functions-aggregate.html>
> >>>
> >>> [2] https://msdn.microsoft.com/en-us/library/ms187810.aspx <
> >>> https://msdn.microsoft.com/en-us/library/ms187810.aspx>
> >>>
> >>>> On Aug 9, 2015, at 8:09 PM, Li Yang <[email protected]> wrote:
> >>>>
> >>>>> 1. If x is an integer, what is the type of sum(x)?
> >>>>
> >>>> This is the key question.  If calcite believes sum(int) = int, then
> >> Kylin
> >>>> have to find solution else where.  User demands the correct sum result
> >>>> anyway.  It's very hard to explain and justify the behavior to user,
> >>>> because other SQL engines like Postgres seem simply works...
> >>>>
> >>>> One workaround maybe let all initial values be bigint.
> >>>>
> >>>> On Sat, Aug 8, 2015 at 9:02 AM, Julian Hyde <[email protected]> wrote:
> >>>>
> >>>>> I would approach it a different way. There are 2 separate questions:
> >>>>>
> >>>>> 1. If x is an integer, what is the type of sum(x)?
> >>>>>
> >>>>> This concerns SQL query validation. Calcite’s answer is that if x has
> >>> type
> >>>>> T, then sum(x) has type T. Not perfect, but simple. If x is an int
> and
> >>> you
> >>>>> want the result to be a bigint, just write sum(cast(x as bigint)).
> >>>>>
> >>>>> 2. Do we detect overflow while calculating sum, and if so, how?
> >>>>>
> >>>>> This is an implementation question, and needs to be solved in each
> >>> engine.
> >>>>> Drill is one such engine, and Enumerable is another. Enumerable does
> >> not
> >>>>> currently detect overflow.
> >>>>>
> >>>>> One strategy would be to use a  higher precision data type internally
> >>> (but
> >>>>> this strategy works only if you have an upper bound on the number of
> >>> input
> >>>>> rows). Another is to use a method such as
> java.lang.Math.addExact(int,
> >>> int).
> >>>>>
> >>>>> Julian
> >>>>>
> >>>>>
> >>>>>
> >>>>>> On Aug 7, 2015, at 10:38 AM, Jinfeng Ni <[email protected]>
> >> wrote:
> >>>>>>
> >>>>>> I think it makes sense to use bigint as the result type for
> >>> sum(integer).
> >>>>>>
> >>>>>> Postgres seems to work in this way.
> >>>>>>
> >>>>>> mydb=# \d+ emp
> >>>>>>                                Table "public.emp"
> >>>>>> Column  |         Type          | Modifiers | Storage  | Stats
> >> target |
> >>>>>> Description
> >>>>>>
> >>>>>
> >>>
> >>
> ----------+-----------------------+-----------+----------+--------------+-------------
> >>>>>> empno    | integer               |           | plain    |
> >>> |
> >>>>>> .....
> >>>>>>
> >>>>>> create table tmp as select sum(empno) sum_eno from emp;
> >>>>>>
> >>>>>> \d+ tmp;
> >>>>>>                       Table "public.tmp"
> >>>>>> Column  |  Type  | Modifiers | Storage | Stats target | Description
> >>>>>>
> ---------+--------+-----------+---------+--------------+-------------
> >>>>>> sum_eno | bigint |           | plain   |              |
> >>>>>>
> >>>>>>
> >>>>>> As we can see, the column sum_eno in 'tmp' table after the CTAS
> >>> statement
> >>>>>> has bigint type.
> >>>>>>
> >>>>>> In Drill, we also use bigint for sum(integer).  Drill has to put
> >>>>> additional
> >>>>>> logic, since Calcite by default will use int as the result type for
> >>>>>> sum(int).
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> On Fri, Aug 7, 2015 at 6:51 AM, hongbin ma <[email protected]>
> >>> wrote:
> >>>>>>
> >>>>>>> hi,
> >>>>>>>
> >>>>>>> Suppose I have a table column called "price", its data type is
> >>> integer.
> >>>>>>> it seems that the sum aggregator in "select sum(price) from table"
> >>> will
> >>>>>>> return integer type, too.
> >>>>>>>
> >>>>>>> When I have millions of rows in the table,
> >>>>>>> "select sum(price) from table" might overflow, is it a bug?
> >>>>>>> Or may I how do you look into this problem?
> >>>>>>>
> >>>>>>> --
> >>>>>>> Regards,
> >>>>>>>
> >>>>>>> *Bin Mahone | 马洪宾*
> >>>>>>> Apache Kylin: http://kylin.io
> >>>>>>> Github: https://github.com/binmahone
> >>>>>>>
> >>>>>
> >>>>>
> >>>
> >>>
> >>
>
>

Re: sum of integer overflow

Reply via email to