On Thu, Jul 26, 2001 at 09:41:00PM -0400, Bernie Cosell wrote:
> On 27 Jul 01, at 3:13, Abigail wrote:
>
> > On Thu, Jul 26, 2001 at 08:45:17PM -0400, Bernie Cosell wrote:
> > >
> > > A tricky design decision is whether to go with a sort with better
> > > average performance in exchange for worse worst-case performance
> > > [e.g., quick sort and quickersort, that are O(N^2) worst case, if I
> > > remember long-ago comp science classes right, but are *usually*
> > > better than nlogn]. Another situation like this is a list-merge sort
> > > which is *fantastic* if the data is nearly sorted...
> >
> >
> > No, they are not usually better than N log N. You can prove that in
> > general you cannot sort faster than N log N. Period. Quicksort is
> > expected (don't use the term 'average' here, that's too vague)
> > O (N log N), but Theta (N^2) worst case.
>
> There are a lot of things I don't remember about all this, but what
> you say doesn't ring a bell, but it might well be---
>
> 1) whether the expected time for a sort can never be better than
> nlogn. I could be convinced, but that's not a math-factoid that I
> actually know. Proving something about the worst case isn't too
> hard, but proving something about the expected value [over the
> assumption that all possible orderings of the input data are equally
> probable] is trickier, I think... but maybe I'm not seeing a trick to
> do that.
>
> 2) Quicksort and Quickersort: I'm not sure I've ever seen a full
> deriviation of the expected times for those sorts. It is certainly
> possible that they're worse than nlogn (many sorts are) -- is it
> really true/provable that they have expected value O(nlogn)? Neat!]
For Quicksort, it's trivially proven that it will, regardless of the input,
nver run in time less than N log N. The basic algorithm is:
partition array_of_size_N;
recurse first_part;
recurse second_part;
merge
The partition/merge steps take O (N). So, we have:
T(N) = T(N/a) + T(N/b) + O (N), a + b == N.
T(1) = O(1)
T(N) is minimal if a == b == 1/2 (perfect split). But then T(N) = O(N log N).
Only if you make a three-way split, elements smaller than the pivot,
elements equal to the pivot, elements larger than the pivot, and
you have lots and lots of duplicates, quicksort can run faster than
O(N log N).
> The classic instance of this is the traditional list-merge sort,
> where you have a HUGE sorted database and a relatively small number
> of new records to add... the best strategy is to do a separate sort
> on the new stuff, and then do a merge, which gets you to
> (DB + NEW) + NEWlogNEW
> which will *kill* most sorts you try to do on the entire list [e.g.,
> just append the new entries onto the end of the list and then sort
> the whole mess].
In the latter situation, the sort in Perl 5.7 will actually notice
the long run of sorted data and take advantage of it. (Yes, it is
a merge sort).
Abigail