On Friday, 15 November 2013 at 21:46:26 UTC, Vladimir Panteleev
wrote:
clip
The algorithm gets stuck on "slopes" in sorted data, which are
not uncommon in real-world data (my data resembled a 1D
heightmap). Although the median-of-three way of picking a pivot
is recommended by some sources (notably Sedgewick), perhaps a
better method would be better suitable for Phobos:
If you use median-of-five (or seven) then you would start getting
reasonable pivots in such an instance, though I am sure someone
could craft an input set that defeats that. Does the Sedgewick
source mention why 3 is picked? Easy to implement I assume.
* Many sources recommend using a random element as a pivot.
According to [2], "Randomized quicksort, for any input, it
requires only O(n log n) expected time (averaged over all
choices of pivots)". Also, if it is not possible to predict the
pivot choice, it is impossible to craft worst-case input, which
is a plus from a security point[3]. However, I'm not sure if
making the behavior of std.algorithm's sort nondeterministic is
desirable.
What are the arguments against using a randomized algorithm?
* It looks like many STL implementations use Introsort[4] - a
hybrid sorting algorithm which starts with QuickSort, but
switches to HeapSort if the recursion limit exceeds a threshold.
* I found a recent article[5] which describes an optimal
algorithm of picking a pivot, however it is behind a paywall.
[1]: Apparently the term is also used to refer to choosing
three points *randomly*, instead of first/middle/last:
http://stackoverflow.com/a/164183/21501
[2]:
http://en.wikipedia.org/wiki/Quicksort#Analysis_of_Randomized_quicksort
[3]:
https://www.google.com/search?q=quicksort+worst+case+denial+of+service
[4]: http://en.wikipedia.org/wiki/Introsort
[5]: https://news.ycombinator.com/item?id=6629117