Paul Rogers created DRILL-5350:
----------------------------------
Summary: Performance: skip merge for single-batch sort
Key: DRILL-5350
URL: https://issues.apache.org/jira/browse/DRILL-5350
Project: Apache Drill
Issue Type: Improvement
Affects Versions: 1.10.0
Reporter: Paul Rogers
Assignee: Paul Rogers
Priority: Minor
Fix For: 1.11.0
The external sort uses the classic two-step sort/merge process:
* Sort each incoming batch. (Optionally spill batches when needed.)
* Merge batches to create the final output.
The external sort uses two distinct merge phases: one if all batches are in
memory, another if some batches were spilled. The memory merge is obviously the
fastest.
A special case occurs when the sort sees only a single batch of data. In this
case, that one batch is already sorted: there is no reason to also run the
merge phase. Skipping the merge will speed up small "operational" queries.
The effect of the optimization was measured using low-level unit tests that set
up the sort and measured just the sort run time, omitting normal query
overhead. Each run consisted of two phases. In the first phase, the test code
was run five times to warm the JVM and Drill code cache. Then, the "money' run
ran another five times. Run times where then averaged.
Data consisted of 64K rows of a very simple schema: (INT, VARCHAR(5)).
Run time without the optimization: 39 ms.
Run time with the optimization: 25 ms.
The result is about a 46% improvement.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)