[
https://issues.apache.org/jira/browse/DRILL-3595?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14652856#comment-14652856
]
Jason Altekruse commented on DRILL-3595:
----------------------------------------
One thing we should make sure to watch out for as we generate test cases, the
limit operation should be used very sparingly. The primary reason for this is
that limit is defined at the SQL level to guarantee nothing about where in the
result set you are getting the records from. Small cases where there is no
parallelization can avoid these issues as a product of how Drill is implemented
today, but this should not be relied upon to write robust tests. In reality,
operations in Drill happen on several threads and across machines, the order
that records will be returned from operations other than a global sort at the
end of the query is not deterministic.
Here the window function used specifies an order by, but this does not enforce
anything about a global ordering of all of the records produced by the
evaluation of the window function. This only enforces order within each
partition.
This is very likely a legitimate bug, but the better way to test it would be to
read a smaller input file, or run a validation against the entire large result
set.
> Wrong results returned by query that uses LEAD window function
> --------------------------------------------------------------
>
> Key: DRILL-3595
> URL: https://issues.apache.org/jira/browse/DRILL-3595
> Project: Apache Drill
> Issue Type: Bug
> Components: Execution - Codegen, Execution - Flow
> Affects Versions: 1.2.0
> Environment: private-branch-new-window-funcs
> Reporter: Khurram Faraaz
> Assignee: Deneche A. Hakim
> Labels: window_function
> Fix For: 1.2.0
>
> Attachments: union_01.parquet
>
>
> Query that uses LEAD window function returns wrong results on developer's
> private branch (new-window-funcs).
> Results returned by Drill
> {code}
> 0: jdbc:drill:schema=dfs.tmp> select lead(c1) over w from union_01 window w
> as (partition by c3 order by c1) limit 10;
> +---------+
> | EXPR$0 |
> +---------+
> | 878 |
> | -150 |
> | 402 |
> | 402 |
> | 402 |
> | 402 |
> | 402 |
> | 160 |
> | 160 |
> | 160 |
> +---------+
> 10 rows selected (0.349 seconds)
> {code}
> Results returned by Postgres
> {code}
> postgres=# select lead(c1) over w from union_01 window w as (partition by c3
> order by c1) limit 10;
> lead
> ------
>
>
> 402
> 402
> 402
> 402
> 402
>
> 160
> 160
> (10 rows)
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)