https://issues.apache.org/jira/browse/CALCITE-845 created
On Thu, Aug 13, 2015 at 4:23 AM, Julian Hyde <[email protected]> wrote: > OK, if you want this feature please log a JIRA case to track it. > > > On Aug 11, 2015, at 2:17 AM, Li Yang <[email protected]> wrote: > > > > A customizable policy is great since there's no standard. > > > > On Tue, Aug 11, 2015 at 5:54 AM, Jinfeng Ni <[email protected]> > wrote: > > > >> I think adding a policy in Calcite to derive SUM's return type makes > sense, > >> as SQL standard seems not specify exactly what the return type of SUM > >> should be. As a result, each system could choose to use different > policy in > >> the implementation. > >> > >> Oracle seems to use "the same data type as the numeric data type of the > >> argument" [1], while DB2 uses the policy "The result is a large integer > if > >> the argument values are small integers" [2]. > >> > >> > >> [1] http://docs.oracle.com/database/121/SQLRF/functions196.htm#i89126 > >> [2] > >> > >> > http://public.dhe.ibm.com/ps/products/db2/info/vr101/pdf/en_US/DB2SQLRefVol1-db2s1e1011.pdf > >> > >> On Mon, Aug 10, 2015 at 2:05 PM, Julian Hyde <[email protected]> wrote: > >> > >>> Postgres return type is "bigint for smallint or int arguments, numeric > >> for > >>> bigint arguments, otherwise the same as the argument data type”[1] > >>> > >>> SQL Server return type is int for tinyint, smallint or int; bigint for > >>> bigint[2]. > >>> > >>> I can see your point that “User demands the correct sum result”. But > I’d > >>> also be pissed with Postgres if it returned a numeric (arbitrary > >> precision) > >>> result when I am summing a bigint value. So I don’t think we’re going > to > >>> please everyone. > >>> > >>> I think the solution is to add the policy to derive SUM’s return type > to > >>> as a new method to RelDataTypeSystem. Then Kylin can supply its own. > >>> > >>> Julian > >>> > >>> [1] http://www.postgresql.org/docs/9.1/static/functions-aggregate.html > < > >>> http://www.postgresql.org/docs/9.1/static/functions-aggregate.html> > >>> > >>> [2] https://msdn.microsoft.com/en-us/library/ms187810.aspx < > >>> https://msdn.microsoft.com/en-us/library/ms187810.aspx> > >>> > >>>> On Aug 9, 2015, at 8:09 PM, Li Yang <[email protected]> wrote: > >>>> > >>>>> 1. If x is an integer, what is the type of sum(x)? > >>>> > >>>> This is the key question. If calcite believes sum(int) = int, then > >> Kylin > >>>> have to find solution else where. User demands the correct sum result > >>>> anyway. It's very hard to explain and justify the behavior to user, > >>>> because other SQL engines like Postgres seem simply works... > >>>> > >>>> One workaround maybe let all initial values be bigint. > >>>> > >>>> On Sat, Aug 8, 2015 at 9:02 AM, Julian Hyde <[email protected]> wrote: > >>>> > >>>>> I would approach it a different way. There are 2 separate questions: > >>>>> > >>>>> 1. If x is an integer, what is the type of sum(x)? > >>>>> > >>>>> This concerns SQL query validation. Calcite’s answer is that if x has > >>> type > >>>>> T, then sum(x) has type T. Not perfect, but simple. If x is an int > and > >>> you > >>>>> want the result to be a bigint, just write sum(cast(x as bigint)). > >>>>> > >>>>> 2. Do we detect overflow while calculating sum, and if so, how? > >>>>> > >>>>> This is an implementation question, and needs to be solved in each > >>> engine. > >>>>> Drill is one such engine, and Enumerable is another. Enumerable does > >> not > >>>>> currently detect overflow. > >>>>> > >>>>> One strategy would be to use a higher precision data type internally > >>> (but > >>>>> this strategy works only if you have an upper bound on the number of > >>> input > >>>>> rows). Another is to use a method such as > java.lang.Math.addExact(int, > >>> int). > >>>>> > >>>>> Julian > >>>>> > >>>>> > >>>>> > >>>>>> On Aug 7, 2015, at 10:38 AM, Jinfeng Ni <[email protected]> > >> wrote: > >>>>>> > >>>>>> I think it makes sense to use bigint as the result type for > >>> sum(integer). > >>>>>> > >>>>>> Postgres seems to work in this way. > >>>>>> > >>>>>> mydb=# \d+ emp > >>>>>> Table "public.emp" > >>>>>> Column | Type | Modifiers | Storage | Stats > >> target | > >>>>>> Description > >>>>>> > >>>>> > >>> > >> > ----------+-----------------------+-----------+----------+--------------+------------- > >>>>>> empno | integer | | plain | > >>> | > >>>>>> ..... > >>>>>> > >>>>>> create table tmp as select sum(empno) sum_eno from emp; > >>>>>> > >>>>>> \d+ tmp; > >>>>>> Table "public.tmp" > >>>>>> Column | Type | Modifiers | Storage | Stats target | Description > >>>>>> > ---------+--------+-----------+---------+--------------+------------- > >>>>>> sum_eno | bigint | | plain | | > >>>>>> > >>>>>> > >>>>>> As we can see, the column sum_eno in 'tmp' table after the CTAS > >>> statement > >>>>>> has bigint type. > >>>>>> > >>>>>> In Drill, we also use bigint for sum(integer). Drill has to put > >>>>> additional > >>>>>> logic, since Calcite by default will use int as the result type for > >>>>>> sum(int). > >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>> On Fri, Aug 7, 2015 at 6:51 AM, hongbin ma <[email protected]> > >>> wrote: > >>>>>> > >>>>>>> hi, > >>>>>>> > >>>>>>> Suppose I have a table column called "price", its data type is > >>> integer. > >>>>>>> it seems that the sum aggregator in "select sum(price) from table" > >>> will > >>>>>>> return integer type, too. > >>>>>>> > >>>>>>> When I have millions of rows in the table, > >>>>>>> "select sum(price) from table" might overflow, is it a bug? > >>>>>>> Or may I how do you look into this problem? > >>>>>>> > >>>>>>> -- > >>>>>>> Regards, > >>>>>>> > >>>>>>> *Bin Mahone | 马洪宾* > >>>>>>> Apache Kylin: http://kylin.io > >>>>>>> Github: https://github.com/binmahone > >>>>>>> > >>>>> > >>>>> > >>> > >>> > >> > >
