On 27 Jul 01, at 3:13, Abigail wrote:

> On Thu, Jul 26, 2001 at 08:45:17PM -0400, Bernie Cosell wrote:
> > 
> > A tricky design decision is whether to go with a sort with better 
> > average performance in exchange for worse worst-case performance 
> > [e.g., quick sort and quickersort, that are O(N^2) worst case, if I 
> > remember long-ago comp science classes right, but are *usually* 
> > better than nlogn].  Another situation like this is a list-merge sort 
> > which is *fantastic* if the data is nearly sorted...
> 
> 
> No, they are not usually better than N log N. You can prove that in
> general you cannot sort faster than N log N. Period. Quicksort is
> expected (don't use the term 'average' here, that's too vague)
> O (N log N), but Theta (N^2) worst case.

There are a lot of things I don't remember about all this, but what 
you say doesn't ring a bell, but it might well be---

1) whether the expected time for a sort can never be better than 
nlogn.   I could be convinced, but that's not a math-factoid that I 
actually know.  Proving something about the worst case isn't too 
hard, but proving something about the expected value [over the 
assumption that all possible orderings of the input data are equally 
probable] is trickier, I think... but maybe I'm not seeing a trick to 
do that.

2) Quicksort and Quickersort: I'm not sure I've ever seen a full 
deriviation of the expected times for those sorts.  It is certainly 
possible that they're worse than nlogn (many sorts are) -- is it 
really true/provable that they have expected value O(nlogn)?  Neat!]


> Only when you have restricted domains, you may be able to sort faster.

Or when you know something about the data.  For example, if it is 
partially sorted, or tends to have "runs" or you know something about 
the distribution of keys in the keyspace [the 'expected value' 
calculations are generally based on the assumption that EVERY 
possible input-key-set-ordering is equally probable, which is almost 
NEVER the case in real-world datasets] or the like. 

The classic instance of this is the traditional list-merge sort, 
where you have a HUGE sorted database and a relatively small number 
of new records to add...  the best strategy is to do a separate sort 
on the new stuff, and then do a merge, which gets you to
  (DB + NEW) + NEWlogNEW
which will *kill* most sorts you try to do on the entire list [e.g., 
just append the new entries onto the end of the list and then sort 
the whole mess].  

   /Bernie\
-- 
Bernie Cosell                     Fantasy Farm Fibers
mailto:[EMAIL PROTECTED]     Pearisburg, VA
    -->  Too many people, too few sheep  <--          

Reply via email to