On Monday, 22 April 2013 at 21:34:47 UTC, Andrei Alexandrescu wrote:

a) choose median of five

Sounds like it could decrease speed on average, and still perhaps fail on a simple case like [0, 1, 2, 3, ..., 999, 0, 0] or something? Didn't test it though.

b) if the array is large, sort a stride of it first (e.g. 1%) then choose the pivot as the median point of that stride

A bad case would be, e.g., the array containing almost sorted parts, and the stride getting taken from the part where lowest or highest elements reside.

c) add introspection to the algorithm, i.e. if an attempt to partition hits the worst case or near worst case, just try another pivot instead of moving forward with the sorting stage.

Now, trying another pivot again and again could give the same O(n^2) in a bad case.

One simple solution to this (which I forgot to mention) is
picking a pivot at random.

But it would again mean that the sort function depends on global state (RNG state). And if it doesn't (a local RNG is initialized somehow each time), an adversary would just have to hardcode it once to get the same O(n^2) worst case anytime he wants.

And on Tuesday, 23 April 2013 at 01:10:26 UTC, Xinok wrote:
I filed a bug report for this issue a year ago:
http://d.puremagic.com/issues/show_bug.cgi?id=7767

I've been meaning to fix this issue myself. Time allowing, I'll do it soon.

What I wonder now, what would be the goal of such a fix?

One possible goal would be to have an O(n log n) worst case sort. And that would perhaps cost some speed and/or memory on average. Besides, such a sort function is already implemented (TimSort), so it's just a matter of setting the default then.

Another goal would be to have O(n^2) only for cases which don't have a reasonable chance to occur in real world, but could still be constructed artificially by an adversary. And that could perhaps be done by just changing the getPivot implementation. Other languages' experience may be useful here:

1. GNU C++ 4.x std::sort implementation is IntroSort [1].
2. Microsoft .NET used Quicksort up to version 4.0 [2].
   Now in .NET 4.5 they use Introsort too [3].
3. For primitive types, Java 6 used a tuned QuickSort [4].
   The current Java 7 uses Dual-Pivot QuickSort [5].
   Java uses TimSort for Objects [6].
4. Python uses TimSort since version 2.3 [7].

In any case, there are techniques to construct a bad case given a QuickSort implementation. One of them is described by M. Douglas McIlroy in "A Killer Adversary for Quicksort" [8]. We run QuickSort on a would-be-array "a" controlling the "a[i] < a[j]" predicate, and, using only certain assumptions (not looking at the code!), we build a killer-case array on the fly. Given some thought, the technique could perhaps be extended to Dual-Pivot QuickSort too. As long as some flavor of QuickSort (and topN) is in Phobos, such a unittest would fit the sort implementation nicely, if only to show just how "bad" it can get.

Ivan Kazmenko.

-----

References:
[1] http://gcc.gnu.org/onlinedocs/libstdc++/libstdc++-html-USERS-4.4/a01027.html#g152148508b4a39e15ffbfbc987ab653a [2] http://msdn.microsoft.com/en-us/library/6tf1f0bc%28v=vs.100%29.aspx [3] http://msdn.microsoft.com/en-us/library/6tf1f0bc%28v=vs.110%29.aspx [4] http://docs.oracle.com/javase/6/docs/api/java/util/Arrays.html#sort%28int[]%29 [5] http://docs.oracle.com/javase/7/docs/api/java/util/Arrays.html#sort%28int[]%29 [6] http://docs.oracle.com/javase/7/docs/api/java/util/Arrays.html#sort%28java.lang.Object[]%29
[7] http://en.wikipedia.org/wiki/Timsort
[8] http://www.cs.dartmouth.edu/~doug/mdmspe.pdf

Reply via email to