[ 
https://issues.apache.org/jira/browse/CALCITE-3920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17173006#comment-17173006
 ] 

Stamatis Zampetakis commented on CALCITE-3920:
----------------------------------------------

I am in favor of adding the new rule to the standard ruleset; I think it is 
going to be beneficial for most users relying on the Enumerable convention. 

I would prefer to skip the extra config parameter (stable or not stable) and 
conclude to use either LimitSort or PriorityQueue as the algorithm of choice 
(I'm fine with any of those). That way we don't end up having code that is not 
really used by someone. 

TL;DR
[~thomas.rebele] I saw in the benchmarks that you used some custom counters to 
measure the number of comparisons which I think is a good idea. In case you 
didn't know there are some 
[profilers|https://hg.openjdk.java.net/code-tools/jmh/file/dab3a1912d28/jmh-samples/src/main/java/org/openjdk/jmh/samples/JMHSample_35_Profilers.java]
 (e.g., LinuxPerfProfiler) that can be used with JMH to obtain some interesting 
metrics such as the number of CPU instructions or cycles per instruction  (CPI) 
which can provide more insights than what can be shown with high level 
counters. Note that I am not asking additional experiments, I am sharing this 
info in case it is useful in the future. 

> Improve ORDER BY computation in Enumerable convention by exploiting LIMIT
> -------------------------------------------------------------------------
>
>                 Key: CALCITE-3920
>                 URL: https://issues.apache.org/jira/browse/CALCITE-3920
>             Project: Calcite
>          Issue Type: Improvement
>          Components: core
>    Affects Versions: 1.22.0
>            Reporter: Stamatis Zampetakis
>            Assignee: Thomas Rebele
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 3h 20m
>  Remaining Estimate: 0h
>
> There are many use-cases (pagination, top-k) relying on queries with an ORDER 
> BY clause followed by a LIMIT. 
> At the moment, the two operations are implemented independently the one from 
> the other 
>  in the Enumerable convention. Even when we know that consumer needs only the 
> top-10 results the sort operation will try to maintain its entire input 
> sorted. The complexity of the sorting operation is O( n ) space and O( nlogn 
> ) time, where n is the size of the input. 
> By implementing ORDER BY and LIMIT together there are various optimizations 
> that can be applied to reduce the space and time complexity of the sorting 
> algorithm.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to