Re: sum of integer overflow

Julian Hyde Wed, 12 Aug 2015 14:50:10 -0700
OK, if you want this feature please log a JIRA case to track it.

> On Aug 11, 2015, at 2:17 AM, Li Yang <[email protected]> wrote:
> 
> A customizable policy is great since there's no standard.
> 
> On Tue, Aug 11, 2015 at 5:54 AM, Jinfeng Ni <[email protected]> wrote:
> 
>> I think adding a policy in Calcite to derive SUM's return type makes sense,
>> as SQL standard seems not specify exactly what the return type of SUM
>> should be. As a result, each system could choose to use different policy in
>> the implementation.
>> 
>> Oracle seems to use "the same data type as the numeric data type of the
>> argument" [1], while DB2 uses the policy "The result is a large integer if
>> the argument values are small integers" [2].
>> 
>> 
>> [1] http://docs.oracle.com/database/121/SQLRF/functions196.htm#i89126
>> [2]
>> 
>> http://public.dhe.ibm.com/ps/products/db2/info/vr101/pdf/en_US/DB2SQLRefVol1-db2s1e1011.pdf
>> 
>> On Mon, Aug 10, 2015 at 2:05 PM, Julian Hyde <[email protected]> wrote:
>> 
>>> Postgres return type is "bigint for smallint or int arguments, numeric
>> for
>>> bigint arguments, otherwise the same as the argument data type”[1]
>>> 
>>> SQL Server return type is int for tinyint, smallint or int; bigint for
>>> bigint[2].
>>> 
>>> I can see your point that “User demands the correct sum result”. But I’d
>>> also be pissed with Postgres if it returned a numeric (arbitrary
>> precision)
>>> result when I am summing a bigint value. So I don’t think we’re going to
>>> please everyone.
>>> 
>>> I think the solution is to add the policy to derive SUM’s return type to
>>> as a new method to RelDataTypeSystem. Then Kylin can supply its own.
>>> 
>>> Julian
>>> 
>>> [1] http://www.postgresql.org/docs/9.1/static/functions-aggregate.html <
>>> http://www.postgresql.org/docs/9.1/static/functions-aggregate.html>
>>> 
>>> [2] https://msdn.microsoft.com/en-us/library/ms187810.aspx <
>>> https://msdn.microsoft.com/en-us/library/ms187810.aspx>
>>> 
>>>> On Aug 9, 2015, at 8:09 PM, Li Yang <[email protected]> wrote:
>>>> 
>>>>> 1. If x is an integer, what is the type of sum(x)?
>>>> 
>>>> This is the key question.  If calcite believes sum(int) = int, then
>> Kylin
>>>> have to find solution else where.  User demands the correct sum result
>>>> anyway.  It's very hard to explain and justify the behavior to user,
>>>> because other SQL engines like Postgres seem simply works...
>>>> 
>>>> One workaround maybe let all initial values be bigint.
>>>> 
>>>> On Sat, Aug 8, 2015 at 9:02 AM, Julian Hyde <[email protected]> wrote:
>>>> 
>>>>> I would approach it a different way. There are 2 separate questions:
>>>>> 
>>>>> 1. If x is an integer, what is the type of sum(x)?
>>>>> 
>>>>> This concerns SQL query validation. Calcite’s answer is that if x has
>>> type
>>>>> T, then sum(x) has type T. Not perfect, but simple. If x is an int and
>>> you
>>>>> want the result to be a bigint, just write sum(cast(x as bigint)).
>>>>> 
>>>>> 2. Do we detect overflow while calculating sum, and if so, how?
>>>>> 
>>>>> This is an implementation question, and needs to be solved in each
>>> engine.
>>>>> Drill is one such engine, and Enumerable is another. Enumerable does
>> not
>>>>> currently detect overflow.
>>>>> 
>>>>> One strategy would be to use a  higher precision data type internally
>>> (but
>>>>> this strategy works only if you have an upper bound on the number of
>>> input
>>>>> rows). Another is to use a method such as java.lang.Math.addExact(int,
>>> int).
>>>>> 
>>>>> Julian
>>>>> 
>>>>> 
>>>>> 
>>>>>> On Aug 7, 2015, at 10:38 AM, Jinfeng Ni <[email protected]>
>> wrote:
>>>>>> 
>>>>>> I think it makes sense to use bigint as the result type for
>>> sum(integer).
>>>>>> 
>>>>>> Postgres seems to work in this way.
>>>>>> 
>>>>>> mydb=# \d+ emp
>>>>>>                                Table "public.emp"
>>>>>> Column  |         Type          | Modifiers | Storage  | Stats
>> target |
>>>>>> Description
>>>>>> 
>>>>> 
>>> 
>> ----------+-----------------------+-----------+----------+--------------+-------------
>>>>>> empno    | integer               |           | plain    |
>>> |
>>>>>> .....
>>>>>> 
>>>>>> create table tmp as select sum(empno) sum_eno from emp;
>>>>>> 
>>>>>> \d+ tmp;
>>>>>>                       Table "public.tmp"
>>>>>> Column  |  Type  | Modifiers | Storage | Stats target | Description
>>>>>> ---------+--------+-----------+---------+--------------+-------------
>>>>>> sum_eno | bigint |           | plain   |              |
>>>>>> 
>>>>>> 
>>>>>> As we can see, the column sum_eno in 'tmp' table after the CTAS
>>> statement
>>>>>> has bigint type.
>>>>>> 
>>>>>> In Drill, we also use bigint for sum(integer).  Drill has to put
>>>>> additional
>>>>>> logic, since Calcite by default will use int as the result type for
>>>>>> sum(int).
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> On Fri, Aug 7, 2015 at 6:51 AM, hongbin ma <[email protected]>
>>> wrote:
>>>>>> 
>>>>>>> hi,
>>>>>>> 
>>>>>>> Suppose I have a table column called "price", its data type is
>>> integer.
>>>>>>> it seems that the sum aggregator in "select sum(price) from table"
>>> will
>>>>>>> return integer type, too.
>>>>>>> 
>>>>>>> When I have millions of rows in the table,
>>>>>>> "select sum(price) from table" might overflow, is it a bug?
>>>>>>> Or may I how do you look into this problem?
>>>>>>> 
>>>>>>> --
>>>>>>> Regards,
>>>>>>> 
>>>>>>> *Bin Mahone | 马洪宾*
>>>>>>> Apache Kylin: http://kylin.io
>>>>>>> Github: https://github.com/binmahone
>>>>>>> 
>>>>> 
>>>>> 
>>> 
>>> 
>>
Re: sum of integer overflow

Reply via email to