> 1. If x is an integer, what is the type of sum(x)?

This is the key question.  If calcite believes sum(int) = int, then Kylin
have to find solution else where.  User demands the correct sum result
anyway.  It's very hard to explain and justify the behavior to user,
because other SQL engines like Postgres seem simply works...

One workaround maybe let all initial values be bigint.

On Sat, Aug 8, 2015 at 9:02 AM, Julian Hyde <[email protected]> wrote:

> I would approach it a different way. There are 2 separate questions:
>
> 1. If x is an integer, what is the type of sum(x)?
>
> This concerns SQL query validation. Calcite’s answer is that if x has type
> T, then sum(x) has type T. Not perfect, but simple. If x is an int and you
> want the result to be a bigint, just write sum(cast(x as bigint)).
>
> 2. Do we detect overflow while calculating sum, and if so, how?
>
> This is an implementation question, and needs to be solved in each engine.
> Drill is one such engine, and Enumerable is another. Enumerable does not
> currently detect overflow.
>
> One strategy would be to use a  higher precision data type internally (but
> this strategy works only if you have an upper bound on the number of input
> rows). Another is to use a method such as java.lang.Math.addExact(int, int).
>
> Julian
>
>
>
> > On Aug 7, 2015, at 10:38 AM, Jinfeng Ni <[email protected]> wrote:
> >
> > I think it makes sense to use bigint as the result type for sum(integer).
> >
> > Postgres seems to work in this way.
> >
> > mydb=# \d+ emp
> >                                  Table "public.emp"
> >  Column  |         Type          | Modifiers | Storage  | Stats target |
> > Description
> >
> ----------+-----------------------+-----------+----------+--------------+-------------
> > empno    | integer               |           | plain    |              |
> > .....
> >
> > create table tmp as select sum(empno) sum_eno from emp;
> >
> > \d+ tmp;
> >                         Table "public.tmp"
> > Column  |  Type  | Modifiers | Storage | Stats target | Description
> > ---------+--------+-----------+---------+--------------+-------------
> > sum_eno | bigint |           | plain   |              |
> >
> >
> > As we can see, the column sum_eno in 'tmp' table after the CTAS statement
> > has bigint type.
> >
> > In Drill, we also use bigint for sum(integer).  Drill has to put
> additional
> > logic, since Calcite by default will use int as the result type for
> > sum(int).
> >
> >
> >
> >
> > On Fri, Aug 7, 2015 at 6:51 AM, hongbin ma <[email protected]> wrote:
> >
> >> hi,
> >>
> >> Suppose I have a table column called "price", its data type is integer.
> >> it seems that the sum aggregator in "select sum(price) from table" will
> >> return integer type, too.
> >>
> >> When I have millions of rows in the table,
> >> "select sum(price) from table" might overflow, is it a bug?
> >> Or may I how do you look into this problem?
> >>
> >> --
> >> Regards,
> >>
> >> *Bin Mahone | 马洪宾*
> >> Apache Kylin: http://kylin.io
> >> Github: https://github.com/binmahone
> >>
>
>

Reply via email to