Re: Worst-case performance of quickSort / getPivot

Ivan Kazmenko Sat, 16 Nov 2013 17:11:42 -0800

On Sunday, 17 November 2013 at 00:18:24 UTC, Chris Cain wrote:

I think it's more complicated than that. Let's assume for amoment that you've proven that such an unstable sort must existthat is faster (I'm not convinced that it necessarily must takeextra work to maintain stability). You have not shown how muchfaster it might be (it could be only 1% faster) nor how muchwork it would take to discover (even an ideal pivot choice forquicksort actually cannot be as fast as Timsort on an alreadysorted sequence, and quicksort is not an appropriate algorithmfor accepting presorted subsequences). If it takes 2 years tocome up with an unstable sort faster than Timsort, then itseems like a long time to be using something that isn't thebest that we currently have. Especially if D is to be used inbenchmarks against other languages.

Regarding an ideal pivot choice for quicksort, I'd like toemphasize that it is simply non-existent. Here's why.

Let us measure quicksort performance as the number of comparisonsit makes.


Let us make some assumptions:
quicksort goes as
--(1) choose a pivot p in O(1) comparisons;
--(2) partition the segment into two;
--(3) run quicksort recursively on the two resulting segments.

Now, (2) is effectively done by comparing every element with p,thus in Theta(n) (on the order of n) comparisons where n is thelength of the current segment.

Under these assumptions, we can construct the killer array for anarbitrary quicksort of such kind, even if we don't know howexactly the pivot is chosen (say, closed source or obfuscation),but we have the following interface to it:

Instead of accessing the array a[], quicksort calls the functionless(i,j) which tells it whether a[i]<a[j], and we control thatfunction.

Now, we start with all values in array a[] undefined. With eacha[i], we associate a counter c[i]: how many times it took part ina comparison. As we can see from the above, during a single callto quicksort, in steps (1) and (2) the pivot will eventually getTheta(n) comparisons, while other elements will all get only O(1)comparisons each.


So here's what we'll do.

1. When a comparison asks to relate two undefined values, we pickone of them with the larger c[i] and make it the lowest numberpossible. That is, the first fixed value will be 0, the next 1,and so on. (In the end, a[] will be a permutation of 0, 1, ...,n-1.)

2. When we are asked to relate a defined value to an undefinedone, the defined value will always be lower (because, when weeventually define the undefined one, it will be higher than allthe values defined before it).


3. When we are asked about two known values, just tell the truth.

This simple algorithm ensures that, on each call to quicksort, wequickly fix the pivot as one of the O(1) lowest values on thesegment. So, one of the next two segments will have lengthn-O(1), and the recursion gives us Theta(n) segments of linearlydecreasing lengths, and thus Theta(n^2) total running time.

Now, the assumption of picking a pivot in O(1) comparisons coversa broad variety of pivot choices, includingfirst/last/middle/random element, median of three or five, medianof medians, or any combination of these. The random pick failsin the following sense: if we seed the RNG, construct a killercase, and then start with the same seed, we get Theta(n^2)behavior reproduced.

Double-pivot quicksort can be forced to go quadratic in a similarfashion.

Now, introsort escapes this fate by switching to aguaranteed-n-log-n algorithm instead of (3) when it goes toodeep. This comes at a (small) cost of checking the current depthon every call to quicksort.

The above is just my retelling of a great short article "A KillerAdversary for Quicksort" by M. D. McIlroy here:http://www.cs.dartmouth.edu/~doug/mdmspe.pdf


Ivan Kazmenko.

Re: Worst-case performance of quickSort / getPivot

Reply via email to