[jira] [Commented] (CALCITE-3998) Bad datatype for sum(n), it should be BIGINT but it is sometimes INTEGER

Enrico Olivelli (Jira) Thu, 14 May 2020 05:25:25 -0700


    [ 
https://issues.apache.org/jira/browse/CALCITE-3998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17107254#comment-17107254
 ]


Enrico Olivelli commented on CALCITE-3998:
------------------------------------------

In this case Calcite looks very smart !
As K1 is the PK sum(n1) will aggregate only one an thus there is no risk of 
integer overflow.

As a counter example if I create the table without a PK

{code:java}
CREATE TABLE tblspace1.tsql (k1 string ,n1 int,s1 string)
{code}
instead of 
{code:java}
CREATE TABLE tblspace1.tsql (k1 string PRIMARY KEY ,n1 int,s1 string)
{code}

the data type is now BIGINT as before.

So we can close this issue as "not a problem"
It demonstrates that Calcite 1.23 is smarter than 1.22 !!

thank you [~winipanda] for your time in this issue


> Bad datatype for sum(n), it should be BIGINT but it is sometimes INTEGER
> ------------------------------------------------------------------------
>
>                 Key: CALCITE-3998
>                 URL: https://issues.apache.org/jira/browse/CALCITE-3998
>             Project: Calcite
>          Issue Type: Bug
>          Components: core
>    Affects Versions: 1.23.0
>            Reporter: Enrico Olivelli
>            Priority: Blocker
>         Attachments: image-2020-05-14-15-37-14-571.png, 
> image-2020-05-14-15-39-28-279.png, image-2020-05-14-16-15-59-907.png, 
> image-2020-05-14-17-07-49-157.png, image-2020-05-14-17-12-11-277.png
>
>
> I also noted that sometimes the type of sum(N) where N is an INTEGER column 
> sometimes it is now reported by Calcite as INTEGER and sometimes as a BIGINT. 
> In 1.22 every time is reported as BIGINT.
> So we have another test failing.
> SELECT sum(n1), count(*) as cc, k1
> FROM tblspace1.tsql
> GROUP by k1
> ORDER BY sum(n1)
> Here sum(n1) is reported now a INTEGER, previously it was a BIGINT. I would 
> prefer to see it as a BIGINT in order to prevent overflows
> Here are the plans:
> {noformat}
> INFO: Query: SELECT sum(n1), count(*) as cc, k1  FROM tblspace1.tsql GROUP by 
> k1 ORDER BY sum(n1) -- Logical Plan
> LogicalSort(sort0=[$0], dir0=[ASC]): rowcount = 2.0, cumulative cost = 
> {10.525000095367432 rows, 37.0 cpu, 0.0 io}, id = 1038
>   LogicalProject(EXPR$0=[$1], CC=[$2], K1=[$0]): rowcount = 2.0, cumulative 
> cost = {8.525000095367432 rows, 13.0 cpu, 0.0 io}, id = 1037
>     LogicalAggregate(group=[{0}], EXPR$0=[SUM($1)], CC=[COUNT()]): rowcount = 
> 2.0, cumulative cost = {6.525000095367432 rows, 7.0 cpu, 0.0 io}, id = 1035
>       LogicalProject(K1=[$0], n1=[$1]): rowcount = 2.0, cumulative cost = 
> {4.0 rows, 7.0 cpu, 0.0 io}, id = 1034
>         LogicalTableScan(table=[[tblspace1, tsql]]): rowcount = 2.0, 
> cumulative cost = {2.0 rows, 3.0 cpu, 0.0 io}, id = 1032
> May 12, 2020 11:07:37 AM herddb.sql.CalcitePlanner runPlanner
> INFO: Query: SELECT sum(n1), count(*) as cc, k1  FROM tblspace1.tsql GROUP by 
> k1 ORDER BY sum(n1) -- Best  Plan
> EnumerableSort(sort0=[$0], dir0=[ASC]): rowcount = 2.0, cumulative cost = 
> {5.0 rows, 31.0 cpu, 0.0 io}, id = 1245
>   EnumerableProject(EXPR$0=[$1], CC=[1:BIGINT], K1=[$0]): rowcount = 2.0, 
> cumulative cost = {3.0 rows, 7.0 cpu, 0.0 io}, id = 1244
>     EnumerableInterpreter: rowcount = 2.0, cumulative cost = {1.0 rows, 1.0 
> cpu, 0.0 io}, id = 1243
>       BindableTableScan(table=[[tblspace1, tsql]], projects=[[0, 1]]): 
> rowcount = 2.0, cumulative cost = {0.016 rows, 0.024 cpu, 0.0 io}, id = 1055
> {noformat}
> Within the same test case with the same tables the result of this query is 
> not changed
> SELECT sum(n1) as ss, min(n1) as mi, max(n1) as ma FROM tblspace1.tsql
> INFO: Query: SELECT sum(n1) as ss, min(n1) as mi, max(n1) as ma FROM 
> tblspace1.tsql -- Logical Plan
> {noformat}
> LogicalAggregate(group=[{}], SS=[SUM($0)], MI=[MIN($0)], MA=[MAX($0)]): 
> rowcount = 1.0, cumulative cost = {5.387500047683716 rows, 5.0 cpu, 0.0 io}, 
> id = 1253
>   LogicalProject(n1=[$1]): rowcount = 2.0, cumulative cost = {4.0 rows, 5.0 
> cpu, 0.0 io}, id = 1252
>     LogicalTableScan(table=[[tblspace1, tsql]]): rowcount = 2.0, cumulative 
> cost = {2.0 rows, 3.0 cpu, 0.0 io}, id = 1250
> May 12, 2020 11:08:48 AM herddb.sql.CalcitePlanner runPlanner
> INFO: Query: SELECT sum(n1) as ss, min(n1) as mi, max(n1) as ma FROM 
> tblspace1.tsql -- Best  Plan
> EnumerableAggregate(group=[{}], SS=[SUM($0)], MI=[MIN($0)], MA=[MAX($0)]): 
> rowcount = 1.0, cumulative cost = {2.387500047683716 rows, 1.0 cpu, 0.0 io}, 
> id = 1295
>   EnumerableInterpreter: rowcount = 2.0, cumulative cost = {1.0 rows, 1.0 
> cpu, 0.0 io}, id = 1294
>     BindableTableScan(table=[[tblspace1, tsql]], projects=[[1]]): rowcount = 
> 2.0, cumulative cost = {0.012 rows, 0.018000000000000002 cpu, 0.0 io}, id = 
> 1265
> {noformat}
> This is the test on HerdDB
> https://github.com/diennea/herddb/blob/vote-calcite-123/herddb-core/src/test/java/herddb/sql/SimplerPlannerTest.java#L237



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (CALCITE-3998) Bad datatype for sum(n), it should be BIGINT but it is sometimes INTEGER

Reply via email to