[ 
https://issues.apache.org/jira/browse/CALCITE-4157?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Rebele updated CALCITE-4157:
-----------------------------------
    Description: 
As shown by [the 
benchmarks|https://github.com/thomasrebele/jmh-micro-benchmarks/blob/98abccad8801532b78a0778cd7be7bd751f90da6/core-java/doc/jmh_partial_sort_jdk1.8.0_241.txt#L6793]
 for CALCITE-3920, the sort with a TreeMap is slower than sorting with 
Arrays.sort(Object[]). The latter takes about 35% less time than sorting with 
TreeMap over a randomized input. While the implementation is not exactly the 
same, it should be close enough to be able to say something about the 
performance of EnumerableDefaults.orderBy(...). The relevant results for this 
issue are the benchmarks with limit=-1 and algorithms treeMap and 
collectionSort.

The speedup might be even better if the input is already sorted by chance 
(i.e., it depends on the actual data during the execution and the planner is 
not aware that the result would be sorted). Modern VMs use TimSort, which 
checks if the input is already sorted.

  was:
As shown by [the 
benchmarks|https://github.com/thomasrebele/jmh-micro-benchmarks/blob/98abccad8801532b78a0778cd7be7bd751f90da6/core-java/doc/jmh_partial_sort_jdk1.8.0_241.txt#L6793]
 for CALCITE-3920, the sort with a TreeMap is slower than sorting with 
Arrays.sort(Object[]). The latter takes about 35% less time than sorting with 
TreeMap over a randomized input. While the implementation is not exactly the 
same, it should be close enough to be able to say something about the 
performance of EnumerableDefaults.orderBy(...). The relevant results for this 
issue are the benchmarks with limit=-1 and algorithms treeMap and 
collectionSort.

The speedup might be even better if the input is already sorted, as modern VMs 
use TimSort, which checks if the input is already sorted.


> Use a different sort algorithm for EnumerableDefaults.orderBy(...)
> ------------------------------------------------------------------
>
>                 Key: CALCITE-4157
>                 URL: https://issues.apache.org/jira/browse/CALCITE-4157
>             Project: Calcite
>          Issue Type: Improvement
>            Reporter: Thomas Rebele
>            Priority: Minor
>
> As shown by [the 
> benchmarks|https://github.com/thomasrebele/jmh-micro-benchmarks/blob/98abccad8801532b78a0778cd7be7bd751f90da6/core-java/doc/jmh_partial_sort_jdk1.8.0_241.txt#L6793]
>  for CALCITE-3920, the sort with a TreeMap is slower than sorting with 
> Arrays.sort(Object[]). The latter takes about 35% less time than sorting with 
> TreeMap over a randomized input. While the implementation is not exactly the 
> same, it should be close enough to be able to say something about the 
> performance of EnumerableDefaults.orderBy(...). The relevant results for this 
> issue are the benchmarks with limit=-1 and algorithms treeMap and 
> collectionSort.
> The speedup might be even better if the input is already sorted by chance 
> (i.e., it depends on the actual data during the execution and the planner is 
> not aware that the result would be sorted). Modern VMs use TimSort, which 
> checks if the input is already sorted.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to