[
https://issues.apache.org/jira/browse/DRILL-5350?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Paul Rogers updated DRILL-5350:
-------------------------------
Issue Type: Sub-task (was: Improvement)
Parent: DRILL-5325
> Performance: skip merge for single-batch sort
> ---------------------------------------------
>
> Key: DRILL-5350
> URL: https://issues.apache.org/jira/browse/DRILL-5350
> Project: Apache Drill
> Issue Type: Sub-task
> Affects Versions: 1.10.0
> Reporter: Paul Rogers
> Assignee: Paul Rogers
> Priority: Minor
> Fix For: 1.11.0
>
>
> The external sort uses the classic two-step sort/merge process:
> * Sort each incoming batch. (Optionally spill batches when needed.)
> * Merge batches to create the final output.
> The external sort uses two distinct merge phases: one if all batches are in
> memory, another if some batches were spilled. The memory merge is obviously
> the fastest.
> A special case occurs when the sort sees only a single batch of data. In this
> case, that one batch is already sorted: there is no reason to also run the
> merge phase. Skipping the merge will speed up small "operational" queries.
> The effect of the optimization was measured using low-level unit tests that
> set up the sort and measured just the sort run time, omitting normal query
> overhead. Each run consisted of two phases. In the first phase, the test code
> was run five times to warm the JVM and Drill code cache. Then, the "money'
> run ran another five times. Run times where then averaged.
> Data consisted of 64K rows of a very simple schema: (INT, VARCHAR(5)).
> Run time without the optimization: 39 ms.
> Run time with the optimization: 25 ms.
> The result is about a 46% improvement.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)