[ 
https://issues.apache.org/jira/browse/DRILL-4494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Deneche A. Hakim updated DRILL-4494:
------------------------------------
    Description: 
Window sum over integer column returns incorrect results.

Drill 1.6.0, git commit ID : git.commit.id=64ab0a8e

Note that Drill returns the same value for SUM(c2) in the below query.

{noformat}
0: jdbc:drill:schema=dfs.tmp> select SUM(c2) OVER w as w_sum from ( SELECT * 
FROM t_alltype ORDER BY c1,c2 DESC NULLS LAST ) subquery WINDOW w AS (PARTITION 
BY c8 ORDER BY c2 DESC NULLS FIRST RANGE BETWEEN UNBOUNDED PRECEDING AND 
UNBOUNDED FOLLOWING);
+---------------+
|     w_sum     |
+---------------+
| -16323034011  |
| -16323034011  |
| -16323034011  |
...
| -16323034011  |
| -16323034011  |
| -16323034011  |
| -16323034011  |
| -16323034011  |
+---------------+
145 rows selected (0.683 seconds)
{noformat}

Postgres 9.3

{noformat}
postgres=#   select SUM(c2) OVER w as w_sum from ( SELECT * FROM t_alltype 
ORDER BY c1,c2 DESC NULLS LAST ) subquery WINDOW w AS (PARTITION BY c8 ORDER BY 
c2 DESC NULLS FIRST RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING);
    w_sum
-------------
 -8995559793
 -8995559793
 -8995559793
 -8995559793
 -8995559793
 -8995559793
 -8995559793
 ...
 -9169127412
 -9169127412
 -9169127412
 -9169127412
 -9169127412
 -9169127412
 -9169127412
 -9169127412
 ...
  1841653194
  1841653194
  1841653194
  1841653194
  1841653194
(145 rows)
{noformat}

Both in Drill and on Postgres there are 142 distinct rows and three nulls of 
the 145 total rows, in column c2.

{noformat}
0: jdbc:drill:schema=dfs.tmp> select count(distinct c2) from t_alltype;
+---------+
| EXPR$0  |
+---------+
| 142     |
+---------+
1 row selected (0.682 seconds)
0: jdbc:drill:schema=dfs.tmp> select c2 from t_alltype where c2 is null;
+-------+
|  c2   |
+-------+
| null  |
| null  |
| null  |
+-------+
3 rows selected (0.389 seconds)
{noformat}

{noformat}
postgres=#  select count(distinct c2) from t_alltype;
 count
-------
   142
(1 row)

postgres=# select c2 from t_alltype where c2 is null;
 c2
----



(3 rows)
{noformat}

  was:
Window sum over integer column returns incorrect results.

Drill 1.6.0, git commit ID : git.commit.id=64ab0a8e

{noformat}

Note that Drill returns the same value for SUM(c2) in the below query.

0: jdbc:drill:schema=dfs.tmp> select SUM(c2) OVER w as w_sum from ( SELECT * 
FROM t_alltype ORDER BY c1,c2 DESC NULLS LAST ) subquery WINDOW w AS (PARTITION 
BY c8 ORDER BY c2 DESC NULLS FIRST RANGE BETWEEN UNBOUNDED PRECEDING AND 
UNBOUNDED FOLLOWING);
+---------------+
|     w_sum     |
+---------------+
| -16323034011  |
| -16323034011  |
| -16323034011  |
...
| -16323034011  |
| -16323034011  |
| -16323034011  |
| -16323034011  |
| -16323034011  |
+---------------+
145 rows selected (0.683 seconds)
{noformat}

Postgres 9.3

{noformat}
postgres=#   select SUM(c2) OVER w as w_sum from ( SELECT * FROM t_alltype 
ORDER BY c1,c2 DESC NULLS LAST ) subquery WINDOW w AS (PARTITION BY c8 ORDER BY 
c2 DESC NULLS FIRST RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING);
    w_sum
-------------
 -8995559793
 -8995559793
 -8995559793
 -8995559793
 -8995559793
 -8995559793
 -8995559793
 ...
 -9169127412
 -9169127412
 -9169127412
 -9169127412
 -9169127412
 -9169127412
 -9169127412
 -9169127412
 ...
  1841653194
  1841653194
  1841653194
  1841653194
  1841653194
(145 rows)

Both in Drill and on Postgres there are 142 distinct rows and three nulls of 
the 145 total rows, in column c2.

{noformat}
0: jdbc:drill:schema=dfs.tmp> select count(distinct c2) from t_alltype;
+---------+
| EXPR$0  |
+---------+
| 142     |
+---------+
1 row selected (0.682 seconds)
0: jdbc:drill:schema=dfs.tmp> select c2 from t_alltype where c2 is null;
+-------+
|  c2   |
+-------+
| null  |
| null  |
| null  |
+-------+
3 rows selected (0.389 seconds)
{noformat}

{noformat}
postgres=#  select count(distinct c2) from t_alltype;
 count
-------
   142
(1 row)

postgres=# select c2 from t_alltype where c2 is null;
 c2
----



(3 rows)
{noformat}


> Window sum over integer column returns incorrect results.
> ---------------------------------------------------------
>
>                 Key: DRILL-4494
>                 URL: https://issues.apache.org/jira/browse/DRILL-4494
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Execution - Relational Operators
>    Affects Versions: 1.6.0
>         Environment: 4 node cluster CentOS
>            Reporter: Khurram Faraaz
>            Priority: Critical
>              Labels: window_function
>             Fix For: 1.7.0
>
>         Attachments: 0_0_0.parquet, t_alltype.csv
>
>
> Window sum over integer column returns incorrect results.
> Drill 1.6.0, git commit ID : git.commit.id=64ab0a8e
> Note that Drill returns the same value for SUM(c2) in the below query.
> {noformat}
> 0: jdbc:drill:schema=dfs.tmp> select SUM(c2) OVER w as w_sum from ( SELECT * 
> FROM t_alltype ORDER BY c1,c2 DESC NULLS LAST ) subquery WINDOW w AS 
> (PARTITION BY c8 ORDER BY c2 DESC NULLS FIRST RANGE BETWEEN UNBOUNDED 
> PRECEDING AND UNBOUNDED FOLLOWING);
> +---------------+
> |     w_sum     |
> +---------------+
> | -16323034011  |
> | -16323034011  |
> | -16323034011  |
> ...
> | -16323034011  |
> | -16323034011  |
> | -16323034011  |
> | -16323034011  |
> | -16323034011  |
> +---------------+
> 145 rows selected (0.683 seconds)
> {noformat}
> Postgres 9.3
> {noformat}
> postgres=#   select SUM(c2) OVER w as w_sum from ( SELECT * FROM t_alltype 
> ORDER BY c1,c2 DESC NULLS LAST ) subquery WINDOW w AS (PARTITION BY c8 ORDER 
> BY c2 DESC NULLS FIRST RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED 
> FOLLOWING);
>     w_sum
> -------------
>  -8995559793
>  -8995559793
>  -8995559793
>  -8995559793
>  -8995559793
>  -8995559793
>  -8995559793
>  ...
>  -9169127412
>  -9169127412
>  -9169127412
>  -9169127412
>  -9169127412
>  -9169127412
>  -9169127412
>  -9169127412
>  ...
>   1841653194
>   1841653194
>   1841653194
>   1841653194
>   1841653194
> (145 rows)
> {noformat}
> Both in Drill and on Postgres there are 142 distinct rows and three nulls of 
> the 145 total rows, in column c2.
> {noformat}
> 0: jdbc:drill:schema=dfs.tmp> select count(distinct c2) from t_alltype;
> +---------+
> | EXPR$0  |
> +---------+
> | 142     |
> +---------+
> 1 row selected (0.682 seconds)
> 0: jdbc:drill:schema=dfs.tmp> select c2 from t_alltype where c2 is null;
> +-------+
> |  c2   |
> +-------+
> | null  |
> | null  |
> | null  |
> +-------+
> 3 rows selected (0.389 seconds)
> {noformat}
> {noformat}
> postgres=#  select count(distinct c2) from t_alltype;
>  count
> -------
>    142
> (1 row)
> postgres=# select c2 from t_alltype where c2 is null;
>  c2
> ----
> (3 rows)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to