Postgres return type is "bigint for smallint or int arguments, numeric for bigint arguments, otherwise the same as the argument data type”[1]
SQL Server return type is int for tinyint, smallint or int; bigint for bigint[2]. I can see your point that “User demands the correct sum result”. But I’d also be pissed with Postgres if it returned a numeric (arbitrary precision) result when I am summing a bigint value. So I don’t think we’re going to please everyone. I think the solution is to add the policy to derive SUM’s return type to as a new method to RelDataTypeSystem. Then Kylin can supply its own. Julian [1] http://www.postgresql.org/docs/9.1/static/functions-aggregate.html <http://www.postgresql.org/docs/9.1/static/functions-aggregate.html> [2] https://msdn.microsoft.com/en-us/library/ms187810.aspx <https://msdn.microsoft.com/en-us/library/ms187810.aspx> > On Aug 9, 2015, at 8:09 PM, Li Yang <[email protected]> wrote: > >> 1. If x is an integer, what is the type of sum(x)? > > This is the key question. If calcite believes sum(int) = int, then Kylin > have to find solution else where. User demands the correct sum result > anyway. It's very hard to explain and justify the behavior to user, > because other SQL engines like Postgres seem simply works... > > One workaround maybe let all initial values be bigint. > > On Sat, Aug 8, 2015 at 9:02 AM, Julian Hyde <[email protected]> wrote: > >> I would approach it a different way. There are 2 separate questions: >> >> 1. If x is an integer, what is the type of sum(x)? >> >> This concerns SQL query validation. Calcite’s answer is that if x has type >> T, then sum(x) has type T. Not perfect, but simple. If x is an int and you >> want the result to be a bigint, just write sum(cast(x as bigint)). >> >> 2. Do we detect overflow while calculating sum, and if so, how? >> >> This is an implementation question, and needs to be solved in each engine. >> Drill is one such engine, and Enumerable is another. Enumerable does not >> currently detect overflow. >> >> One strategy would be to use a higher precision data type internally (but >> this strategy works only if you have an upper bound on the number of input >> rows). Another is to use a method such as java.lang.Math.addExact(int, int). >> >> Julian >> >> >> >>> On Aug 7, 2015, at 10:38 AM, Jinfeng Ni <[email protected]> wrote: >>> >>> I think it makes sense to use bigint as the result type for sum(integer). >>> >>> Postgres seems to work in this way. >>> >>> mydb=# \d+ emp >>> Table "public.emp" >>> Column | Type | Modifiers | Storage | Stats target | >>> Description >>> >> ----------+-----------------------+-----------+----------+--------------+------------- >>> empno | integer | | plain | | >>> ..... >>> >>> create table tmp as select sum(empno) sum_eno from emp; >>> >>> \d+ tmp; >>> Table "public.tmp" >>> Column | Type | Modifiers | Storage | Stats target | Description >>> ---------+--------+-----------+---------+--------------+------------- >>> sum_eno | bigint | | plain | | >>> >>> >>> As we can see, the column sum_eno in 'tmp' table after the CTAS statement >>> has bigint type. >>> >>> In Drill, we also use bigint for sum(integer). Drill has to put >> additional >>> logic, since Calcite by default will use int as the result type for >>> sum(int). >>> >>> >>> >>> >>> On Fri, Aug 7, 2015 at 6:51 AM, hongbin ma <[email protected]> wrote: >>> >>>> hi, >>>> >>>> Suppose I have a table column called "price", its data type is integer. >>>> it seems that the sum aggregator in "select sum(price) from table" will >>>> return integer type, too. >>>> >>>> When I have millions of rows in the table, >>>> "select sum(price) from table" might overflow, is it a bug? >>>> Or may I how do you look into this problem? >>>> >>>> -- >>>> Regards, >>>> >>>> *Bin Mahone | 马洪宾* >>>> Apache Kylin: http://kylin.io >>>> Github: https://github.com/binmahone >>>> >> >>
