[ 
https://issues.apache.org/jira/browse/CALCITE-3920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17172143#comment-17172143
 ] 

Ruben Q L commented on CALCITE-3920:
------------------------------------

I understand that having our own new sort algorithm is risky, but I think we 
can take a conservative approach and move on:
- Provide the new rule (e.g. HeapSortRule), which will not be part of the 
"standard ruleset", so it should not impact any existing test / downstream 
project, but will be available to be used on demand.
- This new rule can have an extra boolean parameter (stable, by default false). 
If {{stable==false}}, {{java.util.PriorityQueue}} will be used for the 
implementation. If {{stable==true}}, {{LimitSort}} will be used.
- As [~thomas.rebele] proposes, we will include {{LimitSort}} algorithm into 
our code, with the appropriate documentation and the corresponding unit tests, 
including randomized tests to verify that it sorts properly. If there is any 
error we shall see it.

> Improve ORDER BY computation in Enumerable convention by exploiting LIMIT
> -------------------------------------------------------------------------
>
>                 Key: CALCITE-3920
>                 URL: https://issues.apache.org/jira/browse/CALCITE-3920
>             Project: Calcite
>          Issue Type: Improvement
>          Components: core
>    Affects Versions: 1.22.0
>            Reporter: Stamatis Zampetakis
>            Assignee: Thomas Rebele
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 3h 20m
>  Remaining Estimate: 0h
>
> There are many use-cases (pagination, top-k) relying on queries with an ORDER 
> BY clause followed by a LIMIT. 
> At the moment, the two operations are implemented independently the one from 
> the other 
>  in the Enumerable convention. Even when we know that consumer needs only the 
> top-10 results the sort operation will try to maintain its entire input 
> sorted. The complexity of the sorting operation is O( n ) space and O( nlogn 
> ) time, where n is the size of the input. 
> By implementing ORDER BY and LIMIT together there are various optimizations 
> that can be applied to reduce the space and time complexity of the sorting 
> algorithm.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to