[ 
https://issues.apache.org/jira/browse/DRILL-4494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Khurram Faraaz updated DRILL-4494:
----------------------------------
    Attachment: t_alltype.csv
                0_0_0.parquet

Attached parquet and CSV file here. Table definition in Postgres

{noformat}
postgres=# \d t_alltype
 c1     | integer                     |
 c2     | integer                     |
 c3     | bigint                      |
 c4     | character varying(256)      |
 c5     | character varying(256)      |
 c6     | timestamp without time zone |
 c7     | date                        |
 c8     | boolean                     |
 c9     | double precision            |
{noformat}

> Window sum over integer column returns incorrect results.
> ---------------------------------------------------------
>
>                 Key: DRILL-4494
>                 URL: https://issues.apache.org/jira/browse/DRILL-4494
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Execution - Flow
>    Affects Versions: 1.6.0
>         Environment: 4 node cluster CentOS
>            Reporter: Khurram Faraaz
>            Priority: Critical
>              Labels: window_function
>         Attachments: 0_0_0.parquet, t_alltype.csv
>
>
> Window sum over integer column returns incorrect results.
> Drill 1.6.0, git commit ID : git.commit.id=64ab0a8e
> {noformat}
> Note that Drill returns the same value for SUM(c2) in the below query.
> 0: jdbc:drill:schema=dfs.tmp> select SUM(c2) OVER w as w_sum from ( SELECT * 
> FROM t_alltype ORDER BY c1,c2 DESC NULLS LAST ) subquery WINDOW w AS 
> (PARTITION BY c8 ORDER BY c2 DESC NULLS FIRST RANGE BETWEEN UNBOUNDED 
> PRECEDING AND UNBOUNDED FOLLOWING);
> +---------------+
> |     w_sum     |
> +---------------+
> | -16323034011  |
> | -16323034011  |
> | -16323034011  |
> ...
> | -16323034011  |
> | -16323034011  |
> | -16323034011  |
> | -16323034011  |
> | -16323034011  |
> +---------------+
> 145 rows selected (0.683 seconds)
> {noformat}
> Postgres 9.3
> {noformat}
> postgres=#   select SUM(c2) OVER w as w_sum from ( SELECT * FROM t_alltype 
> ORDER BY c1,c2 DESC NULLS LAST ) subquery WINDOW w AS (PARTITION BY c8 ORDER 
> BY c2 DESC NULLS FIRST RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED 
> FOLLOWING);
>     w_sum
> -------------
>  -8995559793
>  -8995559793
>  -8995559793
>  -8995559793
>  -8995559793
>  -8995559793
>  -8995559793
>  ...
>  -9169127412
>  -9169127412
>  -9169127412
>  -9169127412
>  -9169127412
>  -9169127412
>  -9169127412
>  -9169127412
>  ...
>   1841653194
>   1841653194
>   1841653194
>   1841653194
>   1841653194
> (145 rows)
> Both in Drill and on Postgres there are 142 distinct rows and three nulls of 
> the 145 total rows, in column c2.
> {noformat}
> 0: jdbc:drill:schema=dfs.tmp> select count(distinct c2) from t_alltype;
> +---------+
> | EXPR$0  |
> +---------+
> | 142     |
> +---------+
> 1 row selected (0.682 seconds)
> 0: jdbc:drill:schema=dfs.tmp> select c2 from t_alltype where c2 is null;
> +-------+
> |  c2   |
> +-------+
> | null  |
> | null  |
> | null  |
> +-------+
> 3 rows selected (0.389 seconds)
> {noformat}
> {noformat}
> postgres=#  select count(distinct c2) from t_alltype;
>  count
> -------
>    142
> (1 row)
> postgres=# select c2 from t_alltype where c2 is null;
>  c2
> ----
> (3 rows)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to