[ https://issues.apache.org/jira/browse/DRILL-5350?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Pritesh Maker updated DRILL-5350: --------------------------------- Fix Version/s: (was: 1.13.0) > Performance: skip merge for single-batch sort > --------------------------------------------- > > Key: DRILL-5350 > URL: https://issues.apache.org/jira/browse/DRILL-5350 > Project: Apache Drill > Issue Type: Improvement > Affects Versions: 1.10.0 > Reporter: Paul Rogers > Assignee: Paul Rogers > Priority: Minor > > The external sort uses the classic two-step sort/merge process: > * Sort each incoming batch. (Optionally spill batches when needed.) > * Merge batches to create the final output. > The external sort uses two distinct merge phases: one if all batches are in > memory, another if some batches were spilled. The memory merge is obviously > the fastest. > A special case occurs when the sort sees only a single batch of data. In this > case, that one batch is already sorted: there is no reason to also run the > merge phase. Skipping the merge will speed up small "operational" queries. > The effect of the optimization was measured using low-level unit tests that > set up the sort and measured just the sort run time, omitting normal query > overhead. Each run consisted of two phases. In the first phase, the test code > was run five times to warm the JVM and Drill code cache. Then, the "money' > run ran another five times. Run times where then averaged. > Data consisted of 64K rows of a very simple schema: (INT, VARCHAR(5)). > Run time without the optimization: 39 ms. > Run time with the optimization: 25 ms. > The result is about a 46% improvement. -- This message was sent by Atlassian JIRA (v7.6.3#76005)