Re: Worst-case performance of quickSort / getPivot

Craig Dillabaugh Sat, 16 Nov 2013 17:51:43 -0800

On Sunday, 17 November 2013 at 01:07:20 UTC, Ivan Kazmenko wrote:

On Sunday, 17 November 2013 at 00:18:24 UTC, Chris Cain wrote:
I think it's more complicated than that. Let's assume for amoment that you've proven that such an unstable sort mustexist that is faster (I'm not convinced that it necessarilymust take extra work to maintain stability). You have notshown how much faster it might be (it could be only 1% faster)nor how much work it would take to discover (even an idealpivot choice for quicksort actually cannot be as fast asTimsort on an already sorted sequence, and quicksort is not anappropriate algorithm for accepting presorted subsequences).If it takes 2 years to come up with an unstable sort fasterthan Timsort, then it seems like a long time to be usingsomething that isn't the best that we currently have.Especially if D is to be used in benchmarks against otherlanguages.
Regarding an ideal pivot choice for quicksort, I'd like toemphasize that it is simply non-existent. Here's why.
Let us measure quicksort performance as the number ofcomparisons it makes.
Let us make some assumptions:
quicksort goes as
--(1) choose a pivot p in O(1) comparisons;
--(2) partition the segment into two;
--(3) run quicksort recursively on the two resulting segments.
Now, (2) is effectively done by comparing every element with p,thus in Theta(n) (on the order of n) comparisons where n is thelength of the current segment.
Under these assumptions, we can construct the killer array foran arbitrary quicksort of such kind, even if we don't know howexactly the pivot is chosen (say, closed source orobfuscation), but we have the following interface to it:
Instead of accessing the array a[], quicksort calls thefunction less(i,j) which tells it whether a[i]<a[j], and wecontrol that function.
Now, we start with all values in array a[] undefined. Witheach a[i], we associate a counter c[i]: how many times it tookpart in a comparison. As we can see from the above, during asingle call to quicksort, in steps (1) and (2) the pivot willeventually get Theta(n) comparisons, while other elements willall get only O(1) comparisons each.
So here's what we'll do.
1. When a comparison asks to relate two undefined values, wepick one of them with the larger c[i] and make it the lowestnumber possible. That is, the first fixed value will be 0, thenext 1, and so on. (In the end, a[] will be a permutation of0, 1, ..., n-1.)
2. When we are asked to relate a defined value to an undefinedone, the defined value will always be lower (because, when weeventually define the undefined one, it will be higher than allthe values defined before it).
3. When we are asked about two known values, just tell thetruth.
This simple algorithm ensures that, on each call to quicksort,we quickly fix the pivot as one of the O(1) lowest values onthe segment. So, one of the next two segments will have lengthn-O(1), and the recursion gives us Theta(n) segments oflinearly decreasing lengths, and thus Theta(n^2) total runningtime.
Now, the assumption of picking a pivot in O(1) comparisonscovers a broad variety of pivot choices, includingfirst/last/middle/random element, median of three or five,median of medians, or any combination of these. The randompick fails in the following sense: if we seed the RNG,construct a killer case, and then start with the same seed, weget Theta(n^2) behavior reproduced.
Double-pivot quicksort can be forced to go quadratic in asimilar fashion.
Now, introsort escapes this fate by switching to aguaranteed-n-log-n algorithm instead of (3) when it goes toodeep. This comes at a (small) cost of checking the currentdepth on every call to quicksort.
The above is just my retelling of a great short article "AKiller Adversary for Quicksort" by M. D. McIlroy here:http://www.cs.dartmouth.edu/~doug/mdmspe.pdf
Ivan Kazmenko.


Woah, that sounds complicated.  Not 100% sure I entirely
understand but it seems to rely on changing the elements while
you sort.  How would that work in real life?

For example, how would this defeat the following pivoting method.

1. Find the median value in the array. This can be done
deterministically in linear time, so from a theoretical point of
view is just as fast as picking a random element since you use
linear time.
2. Use that as the pivot.

Re: Worst-case performance of quickSort / getPivot

Reply via email to