On 11/16/13 4:18 PM, Chris Cain wrote:
I think it's more complicated than that. Let's assume for a moment that
you've proven that such an unstable sort must exist that is faster (I'm
not convinced that it necessarily must take extra work to maintain
stability).

Very simple. Any sequence with a large number of equivalent elements will cause timsort to work more than an unstable algorithm, because it would need to shift around those elements. (Consider e.g. 1, 0, 0, 0, ..., 0.) The discussion then shifts to how many equivalent elements are in the typical range etc.

You have not shown how much faster it might be (it could be
only 1% faster) nor how much work it would take to discover (even an
ideal pivot choice for quicksort actually cannot be as fast as Timsort
on an already sorted sequence, and quicksort is not an appropriate
algorithm for accepting presorted subsequences).

There are well known tweaks to quicksort that make it operate well on sorted or almost sorted sequences.

If it takes 2 years to
come up with an unstable sort faster than Timsort, then it seems like a
long time to be using something that isn't the best that we currently
have. Especially if D is to be used in benchmarks against other languages.

Timsort is implemented now and has no real cost to use as default. If,
at some point, an unstable sort is discovered that is faster on average
than Timsort AND (extremely important) it has no potential for an
attacker to be able to choose a sequence to sort that causes poor
performance, then unstable could be switched back as the default at some
point.

We could do that, but that would be masking a bug instead of fixing it. We should instead focus on fixing unstable sort. One additional issue with making stable sort the default for a while is that people will grow to assume sort() is stable, and will have broken code when we switch back.

Let's fix unstable sort. If it's slower on any other data than Timsort's happy case, we have a bug.


Andrei

Reply via email to