Re: quicksort and compiler optimization

S. Alexander Jacobson Thu, 7 May 1998 15:04:04 -0400 ()
I want to thank Fergus Henderson and  Ralf Hineze for their illuminating
comments on qsort, but I was really more focused on what we can expect
from Haskell compilers.  They describe what seem like a lot of gotcha's
that don't need to be there.

On 07-May-1998, Ralf Hinze <[EMAIL PROTECTED]> wrote:
> > o it traverses the list twice,
> > o it has a space leak (xs is alive until the second traversal is done),

Is it necessary that the compiler do this?  It strikes me that when the
compiler encounters a function that requires multiple comprehensions of
the same list, it should be smart enough to consolidate them into a single
loop (which would also avoid the space leak).  Is my intuition correct
here?

> > o it uses (++) to catenate the results of the recursive calls (note that
> >   (++) takes time proportional to the length of its first argument).

This also seems wierd.  Concatenation is a frequent enough operation on
lists.  Give such frequency, shouldn't the internal representation of a
list include a pointer to its last element just to ensure that
concatenation is constant time?  I realize that you can't point to the end
of a lazy list, but concatenation is not relevant to lazy lists.

On Fri, 8 May 1998, Fergus Henderson wrote:
> Also it misses a few useful optimizations, such as median-of-three
> partitioning.  For descriptions of that and other ways to speed
> up quicksort, see Sedgewicks "Algorithms in C" or "Algorithms in Pascal"
> (I'm afraid Sedgewick didn't write an "Algorithms in Haskell" ;-).

I am not saying that there are not better algorithms.  I am saying that
Haskell should help the programmer avoid implementation level gotchas as
much as possible.

As far as sort perforamnce goes, for most applications qsort seems
very suboptimal (just simple to code -- or so I thought) . My prefered
sorting algorithm has a worst case of O (n * m ) where n is the length of
the list and m is the length of the longest element in the list.  The
problem is that I don't know how to code my preferred sorting algorithm in
Haskell (and I don't think it would be easy even if I did). 
I don't know who the original inventor is or what the official name for
this algorthm is (I'm sure we reinvented the wheel here), but here is how
it works.

---
You recursively partition the list based on the first character not
already processed.  So if you have a list like:

[alex,fergus,robert,allen,frank,ralf]

At the next stage you have:
[(a,[alex,allen]),(f,[fergus,frank]),(r,[robert,ralf])]

It takes at most m passes until each node has only one leaf
[(a,[(al,[(ale,[alex]),(all,[allen])])]),(f,[(fe,[fergus]),(fr,[frank])]),
 (r,[(ra,[ralf]),(ro,[robert])])]

There are many details to getting this right, but this is the general
idea.
---
I realize that this is not completely general.  It requires that the items
being sorted be recursively categorizable (I don't know how to
describe a type like that in Haskell).  But it works for strings and
ints which is most of what I use.  

As I write this, I realize I have no idea how to think about this in
Haskell.  Does the Haskell type system support this?  How do I describe
this tree data structure and maintain a list of leaves accross the
datastructure?  

-Alex-

___________________________________________________________________
S. Alexander Jacobson                   i2x Media  
1-212-697-0184 voice                    1-212-697-1427 fax
Re: quicksort and compiler optimization

Reply via email to