[
https://issues.apache.org/jira/browse/DRILL-4469?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15183069#comment-15183069
]
Deneche A. Hakim commented on DRILL-4469:
-----------------------------------------
Looking at the query plan:
{noformat}
00-00 Screen
00-01 Project(EXPR$0=[$0])
00-02 Project(w0$o0=[$2])
00-03 Window(window#0=[window(partition {0} order by [0] range between
UNBOUNDED PRECEDING and UNBOUNDED FOLLOWING aggs [SUM($1)])])
00-04 SelectionVectorRemover
00-05 Sort(sort0=[$0], sort1=[$0], dir0=[ASC], dir1=[ASC])
00-06 Project(T1¦¦*=[$0], $1=[ITEM($0, 'c1')])
00-07 Project(T1¦¦*=[$0])
00-08 Scan(groupscan=[ParquetGroupScan
[entries=[ReadEntryWithPath
[path=file:/Users/hakim/MapR/data/t_alltype.parquet]],
selectionRoot=file:/Users/hakim/MapR/data/t_alltype.parquet, numFiles=1,
usedMetadataFile=false, columns=[`*`]]])
{noformat}
Something's wrong, we are only expanding the 'c1' column and never expand 'c8
column. Also, although sort is sorting using 2 column they both reference the
same column 'T1||*'
For instance, the following query gives correct results, you will find it's
plan right below:
{noformat}
SELECT SUM(c1) OVER(PARTITION BY c8 ORDER BY c1 RANGE BETWEEN UNBOUNDED
PRECEDING AND UNBOUNDED FOLLOWING) FROM (select * from `t_alltype.parquet`);
{noformat}
{noformat}
00-00 Screen
00-01 Project(EXPR$0=[$0])
00-02 Project($0=[$2])
00-03 Window(window#0=[window(partition {1} order by [0] range between
UNBOUNDED PRECEDING and UNBOUNDED FOLLOWING aggs [SUM($0)])])
00-04 SelectionVectorRemover
00-05 Sort(sort0=[$1], sort1=[$0], dir0=[ASC], dir1=[ASC])
00-06 Project($0=[ITEM($0, 'c1')], $1=[ITEM($0, 'c8')])
00-07 Scan(groupscan=[ParquetGroupScan
[entries=[ReadEntryWithPath
[path=file:/Users/hakim/MapR/data/t_alltype.parquet]],
selectionRoot=file:/Users/hakim/MapR/data/t_alltype.parquet, numFiles=1,
usedMetadataFile=false, columns=[`*`]]])
{noformat}
> SUM window query returns incorrect results over integer data
> ------------------------------------------------------------
>
> Key: DRILL-4469
> URL: https://issues.apache.org/jira/browse/DRILL-4469
> Project: Apache Drill
> Issue Type: Bug
> Components: Query Planning & Optimization
> Affects Versions: 1.6.0
> Environment: 4 node CentOS cluster
> Reporter: Khurram Faraaz
> Priority: Critical
> Labels: window_function
> Attachments: t_alltype.csv, t_alltype.parquet
>
>
> SUM window query returns incorrect results as compared to Postgres, with or
> without the frame clause in the window definition. Note that there is a sub
> query involved and data in column c1 is sorted integer data with no nulls.
> Drill 1.6.0 commit ID: 6d5f4983
> Results from Drill 1.6.0
> {noformat}
> 0: jdbc:drill:schema=dfs.tmp> SELECT SUM(c1) OVER w FROM (select * from
> dfs.tmp.`t_alltype`) subQry WINDOW w AS (PARTITION BY c8 ORDER BY c1 RANGE
> BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING);
> +---------+
> | EXPR$0 |
> +---------+
> | 10585 |
> | 10585 |
> | 10585 |
> | 10585 |
> | 10585 |
> | 10585 |
> ...
> | 10585 |
> | 10585 |
> | 10585 |
> +--------+
> 145 rows selected (0.257 seconds)
> {noformat}
> results from Postgres 9.3
> {noformat}
> postgres=# SELECT SUM(c1) OVER w FROM (select * from t_alltype) subQry WINDOW
> w AS (PARTITION BY c8 ORDER BY c1 RANGE BETWEEN UNBOUNDED PRECEDING AND
> UNBOUNDED FOLLOWING);
> sum
> ------
> 4499
> 4499
> 4499
> 4499
> 4499
> 4499
> ...
> 5613
> 5613
> 5613
> 473
> 473
> 473
> 473
> 473
> (145 rows)
> {noformat}
> Removing the frame clause from window definition, still results in completely
> different results on Postgres vs Drill
> Results from Drill 1.6.0
> {noformat}
> 0: jdbc:drill:schema=dfs.tmp> SELECT SUM(c1) OVER w FROM (select * from
> t_alltype) subQry WINDOW w AS (PARTITION BY c8 ORDER BY c1);
> +---------+
> | EXPR$0 |
> +---------+
> | 10585 |
> | 10585 |
> | 10585 |
> | 10585 |
> | 10585 |
> | 10585 |
> | 10585 |
> | 10585 |
> | 10585 |
> ...
> | 10585 |
> | 10585 |
> | 10585 |
> | 10585 |
> | 10585 |
> +--------+
> 145 rows selected (0.28 seconds)
> {noformat}
> Results from Postgres
> {noformat}
> postgres=# SELECT SUM(c1) OVER w FROM (select * from t_alltype) subQry WINDOW
> w AS (PARTITION BY c8 ORDER BY c1);
> sum
> ------
> 5
> 12
> 21
> 33
> 47
> 62
> 78
> 96
> 115
> 135
> 158
> 182
> 207
> 233
> 260
> 289
> ...
> 4914
> 5051
> 5189
> 5328
> 5470
> 5613
> 8
> 70
> 198
> 332
> 473
> (145 rows)
> {noformat}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)