Re: Worst-case performance of quickSort / getPivot

Chris Cain Fri, 15 Nov 2013 15:06:31 -0800

On Friday, 15 November 2013 at 21:46:26 UTC, Vladimir Panteleevwrote:

I've been investigating an instance that I ran into whileprocessing some data, in which multiSort had apparently stalledwhereas two successive sort calls processed the data quickly.After reducing the code and 20 million points, I've reduced theproblem to this test case:
enum N=1000000;
N.iota.retro.chain(N.only).array.sort();
Apparently, this simple case exhibits worst-case (quadratic)complexity in our sort implementation. The .retro isn'tnecessary - this also exhibits quadratic complexity (athoughthe total number of predicate calls is halved), because thefirst quickSort cycle will flip the order of elements duringpartitioning:
N.iota.chain(0.only).array.sort();
Closer inspection shows that the problem is manifested due tohow the pivot is chosen. Currently, std.algorithm's getPivotuses the median-of-three method (median value of first, middleand last element[1]). However, in this case, the median elementwill always be the first element - this remains true in allquickSort iterations.
Here is an illustrated example with N=9. Note that at the sametime as choosing the pivot, getPivot also orders the candidates(first, middle and last range elements) according to the sortpredicate. After calling getPivot, quickSortImpl moves thepivot to the end of the range by swapping it with the lastelement.
getPivot(0..10)
8,7,6,5,4,3,2,1,0,9 <- getPivot - before swap
9,7,6,5,4,8,2,1,0,3 <- getPivot - after swap
9,7,6,5,4,3,2,1,0,8 <- quickSortImpl - after swap
9,8,6,5,4,3,2,1,0,7 <- quickSortImpl - after partition
getPivot(2..10)
6,5,4,3,2,1,0,7 <- getPivot - before swap
7,5,4,3,6,1,0,2 <- getPivot - after swap
7,5,4,3,2,1,0,6 <- quickSortImpl - after swap
7,6,4,3,2,1,0,5 <- quickSortImpl - after partition
(...)
The algorithm gets stuck on "slopes" in sorted data, which arenot uncommon in real-world data (my data resembled a 1Dheightmap). Although the median-of-three way of picking a pivotis recommended by some sources (notably Sedgewick), perhaps abetter method would be better suitable for Phobos:
* Many sources recommend using a random element as a pivot.According to [2], "Randomized quicksort, for any input, itrequires only O(n log n) expected time (averaged over allchoices of pivots)". Also, if it is not possible to predict thepivot choice, it is impossible to craft worst-case input, whichis a plus from a security point[3]. However, I'm not sure ifmaking the behavior of std.algorithm's sort nondeterministic isdesirable.* It looks like many STL implementations use Introsort[4] - ahybrid sorting algorithm which starts with QuickSort, butswitches to HeapSort if the recursion limit exceeds a threshold.* I found a recent article[5] which describes an optimalalgorithm of picking a pivot, however it is behind a paywall.
[1]: Apparently the term is also used to refer to choosingthree points *randomly*, instead of first/middle/last:http://stackoverflow.com/a/164183/21501[2]:http://en.wikipedia.org/wiki/Quicksort#Analysis_of_Randomized_quicksort[3]:https://www.google.com/search?q=quicksort+worst+case+denial+of+service
[4]: http://en.wikipedia.org/wiki/Introsort
[5]: https://news.ycombinator.com/item?id=6629117

Improving quicksort is probably a good idea, but IIRC, the stablesort has been changed to Timsort, which I find to be faster inmost cases. Probably what should be done is change the default tostable sort and improve the unstable version to be faster in someway.

Presumably you could use a "randomized version," but with aconstant key. That way the algorithm would be deterministic butlikely to have good characteristics in normal inputs.Unfortunately that wouldn't solve the issue of an attackergetting to choose an input and causing worst case performance (itwould just make it harder to come up with such an input).Ultimately, using the stable version (Timsort) is safer forsecurity use (hence why I'd suggest it be the default).

Re: Worst-case performance of quickSort / getPivot

Reply via email to