[
https://issues.apache.org/jira/browse/DRILL-7721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17093745#comment-17093745
]
Paul Rogers commented on DRILL-7721:
------------------------------------
Thanks for the error report. I'm actually surprised that the external sort is
stable; probably only works on a single file with a single batch. The Top-N
operator is implemented as a priority queue, which is not stable.
Drill follows SQL semantics which does not require sorts to be stable. In SQL,
data is inherently unordered (relations are sets), except with an order imposed
by {{ORDER BY}}.
So, the best workaround is to include more columns in your key, where
{{lineno}} is a column that I just made up that gives the lines in your file in
the order you want them displayed:
{code:sql}
select * from dfs.test_sort s order by s.index, s.lineno LIMIT 100
{code}
It is very common, in the classic "customers/orders/items" schema, for the
lines in an order to have a line number for this very reason.
> Inconsistent behavior of external_sort and topN
> -----------------------------------------------
>
> Key: DRILL-7721
> URL: https://issues.apache.org/jira/browse/DRILL-7721
> Project: Apache Drill
> Issue Type: Bug
> Affects Versions: 1.17.0
> Reporter: Oleg Zinoviev
> Priority: Major
> Attachments: sort.png, topn.png
>
>
> The TopN operator with the same sort field value can change the order of the
> rows relative to the original (unstable sort). The External_sort operator
> preserves the original row order (at least in the example). This
> inconsistency breaks the implementation of data pagination.
> Example:
> 1) Create table
> {code:sql}
> create table dfs.test_sort as (
> select 1 as index, 0 as value
> union all
> select 1 as index, 1 as value
> union all
> select 1 as index, -1 as value
> union all
> select 2 as index, 0 as value
> union all
> select 2 as index, 1 as value
> union all
> select 2 as index, -1 as value
> union all
> select -1 as index, 0 as value
> union all
> select -1 as index, 1 as value
> union all
> select -1 as index, -1 as value
> )
> {code}
> 2) Execute query:
> {code:sql}
> select * from dfs.test_sort s order by s.index limit 100 // Should be TopN
> sort
> {code}
> 3) Execute query:
> {code:sql}
> select * from dfs.test_sort s order by s.index limit 1000 // Should be
> External_sort
> {code}
> 4) Compare results:)
> !topn.png! !sort.png!
--
This message was sent by Atlassian Jira
(v8.3.4#803005)