[
https://issues.apache.org/jira/browse/DRILL-3358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14600345#comment-14600345
]
Khurram Faraaz commented on DRILL-3358:
---------------------------------------
cume_dist() returns different (wrong) results when partition by and order by
clause are used with another order by outside the window definition.
Results from Drill
{code}
0: jdbc:drill:schema=dfs.tmp> select rank() over w rnk, percent_rank() over w
prct_rnk, dense_rank() over w denseRank, row_number() over w row_num,
cume_dist() over w cumedist, col_char_2 from vwOnParq_wCst window w as
(partition by col_char_2 order by col_int) order by 1;
+------+-----------+------------+----------+---------------------+-------------+
| rnk | prct_rnk | denseRank | row_num | cumedist | col_char_2 |
+------+-----------+------------+----------+---------------------+-------------+
| 1 | 0.0 | 1 | 1 | 1.0 | AZ |
| 1 | 0.0 | 1 | 1 | 0.5 | CO |
| 1 | 0.0 | 1 | 1 | 1.0 | FL |
| 1 | 0.0 | 1 | 1 | 0.5 | GA |
| 1 | 0.0 | 1 | 1 | 1.0 | HI |
| 1 | 0.0 | 1 | 1 | 0.5 | IN |
| 1 | 0.0 | 1 | 1 | 0.3333333333333333 | MA |
| 1 | 0.0 | 1 | 1 | 0.5 | MD |
| 1 | 0.0 | 1 | 1 | 1.0 | MI |
| 1 | 0.0 | 1 | 1 | 0.5 | MN |
| 1 | 0.0 | 1 | 1 | 1.0 | MO |
| 1 | 0.0 | 1 | 1 | 0.5 | ND |
| 1 | 0.0 | 1 | 1 | 1.0 | OR |
| 1 | 0.0 | 1 | 1 | 0.3333333333333333 | RI |
| 1 | 0.0 | 1 | 1 | 0.5 | SD |
| 1 | 0.0 | 1 | 1 | 0.5 | VT |
| 1 | 0.0 | 1 | 1 | 1.0 | WI |
| 1 | 0.0 | 1 | 1 | 1.0 | WY |
| 2 | 1.0 | 2 | 2 | 1.0 | MN |
| 2 | 0.5 | 2 | 2 | 0.6666666666666666 | RI |
| 2 | 1.0 | 2 | 2 | 1.0 | IN |
| 2 | 0.5 | 2 | 2 | 0.6666666666666666 | MA |
| 2 | 1.0 | 2 | 2 | 1.0 | ND |
| 2 | 1.0 | 2 | 2 | 1.0 | MD |
| 2 | 1.0 | 2 | 2 | 1.0 | CO |
| 2 | 1.0 | 2 | 2 | 1.0 | VT |
| 2 | 1.0 | 2 | 2 | 1.0 | SD |
| 2 | 1.0 | 2 | 2 | 1.0 | GA |
| 3 | 1.0 | 3 | 3 | 1.0 | RI |
| 3 | 1.0 | 3 | 3 | 1.0 | MA |
+------+-----------+------------+----------+---------------------+-------------+
30 rows selected (0.579 seconds)
{code}
Results from Postgres
{code}
postgres=# select rank() over w rnk, percent_rank() over w prct_rnk,
dense_rank() over w denseRank, row_number() over w row_num, cume_dist() over w
cumedist, col_char_2 from vwOnParq_wCst window w as (partition by col_char_2
order by col_int) order by 1;
rnk | prct_rnk | denserank | row_num | cumedist | col_char_2
-----+----------+-----------+---------+-------------------+------------
1 | 0 | 1 | 1 | 1 | WY
1 | 0 | 1 | 1 | 0.5 | CO
1 | 0 | 1 | 1 | 0.333333333333333 | MA
1 | 0 | 1 | 1 | 0.5 | MD
1 | 0 | 1 | 1 | 1 | MI
1 | 0 | 1 | 1 | 0.5 | MN
1 | 0 | 1 | 1 | 1 | MO
1 | 0 | 1 | 1 | 0.5 | ND
1 | 0 | 1 | 1 | 1 | OR
1 | 0 | 1 | 1 | 0.333333333333333 | RI
1 | 0 | 1 | 1 | 0.5 | SD
1 | 0 | 1 | 1 | 0.5 | VT
1 | 0 | 1 | 1 | 1 | WI
1 | 0 | 1 | 1 | 1 | AZ
1 | 0 | 1 | 1 | 1 | FL
1 | 0 | 1 | 1 | 0.5 | GA
1 | 0 | 1 | 1 | 1 | HI
1 | 0 | 1 | 1 | 0.5 | IN
2 | 1 | 2 | 2 | 1 | MD
2 | 1 | 2 | 2 | 1 | VT
2 | 1 | 2 | 2 | 1 | SD
2 | 1 | 2 | 2 | 1 | GA
2 | 1 | 2 | 2 | 1 | MN
2 | 0.5 | 2 | 2 | 0.666666666666667 | RI
2 | 1 | 2 | 2 | 1 | IN
2 | 1 | 2 | 2 | 1 | ND
2 | 1 | 2 | 2 | 1 | CO
2 | 0.5 | 2 | 2 | 0.666666666666667 | MA
3 | 1 | 3 | 3 | 1 | RI
3 | 1 | 3 | 3 | 1 | MA
(30 rows)
{code}
> CUME_DIST window function provides wrong result when only ORDER BY clause is
> specified
> --------------------------------------------------------------------------------------
>
> Key: DRILL-3358
> URL: https://issues.apache.org/jira/browse/DRILL-3358
> Project: Apache Drill
> Issue Type: Bug
> Components: Query Planning & Optimization
> Affects Versions: 1.1.0
> Reporter: Abhishek Girish
> Assignee: Deneche A. Hakim
> Labels: window_function
> Fix For: 1.1.0
>
>
> *Drill:*
> {code:sql}
> > SELECT CUME_DIST() OVER (ORDER BY ss.ss_store_sk) FROM store_sales ss ORDER
> > BY 1 LIMIT 20;
> +---------------------+
> | EXPR$0 |
> +---------------------+
> | 0.9923989432198661 |
> | 0.9923989432198661 |
> | 0.9923989432198661 |
> | 0.9923989432198661 |
> | 0.9923989432198661 |
> | 0.9923989432198661 |
> | 0.9923989432198661 |
> | 0.9923989432198661 |
> | 0.9923989432198661 |
> | 0.9923989432198661 |
> | 0.9923989432198661 |
> | 0.9923989432198661 |
> | 0.9923989432198661 |
> | 0.9923989432198661 |
> | 0.9923989432198661 |
> | 0.9923989432198661 |
> | 0.9923989432198661 |
> | 0.9923989432198661 |
> | 0.9923989432198661 |
> | 0.9923989432198661 |
> +---------------------+
> 20 rows selected (17.317 seconds)
> {code}
> *Postgres*
> {code:sql}
> # SELECT CUME_DIST() OVER (ORDER BY ss.ss_store_sk) FROM store_sales ss ORDER
> BY 1 LIMIT 20;
> cume_dist
> -------------------
> 0.158622193275665
> 0.158622193275665
> 0.158622193275665
> 0.158622193275665
> 0.158622193275665
> 0.158622193275665
> 0.158622193275665
> 0.158622193275665
> 0.158622193275665
> 0.158622193275665
> 0.158622193275665
> 0.158622193275665
> 0.158622193275665
> 0.158622193275665
> 0.158622193275665
> 0.158622193275665
> 0.158622193275665
> 0.158622193275665
> 0.158622193275665
> 0.158622193275665
> (20 rows)
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)