Ralf Hinze wrote:
> > > > o it uses (++) to catenate the results of the recursive calls (note that
> > > >   (++) takes time proportional to the length of its first argument).

Ok, clearly I was wrong about list concatenation, but I think I have now
figured out what was bothering me about Ralf's objection to the of ++.

The argument that the use of ++ in qsort means too many traversals of the
list makes sense if ++ is strict (or you are using a non-lazy language),
but I don't see why the list concatenation needs to happen until you are
going to traverse the list anyway.  Laziness means, as far as I understand
it, that for example:

> testlaziness =take 5 (map (2+) [1,2..]++[5,6..])  
Doesn't fail (it doesn't).

In other words, you only traverse the list when it is
necessary.  So, the time it takes to traverse is time you were going to
spend anyway and therefore ++ is effectively a constant time operation
(without destructive updates!). 

-Alex-

PS. I want to thank everybody  for all the response to this, it has been
very educational.

___________________________________________________________________
S. Alexander Jacobson                   i2x Media  
1-212-697-0184 voice                    1-212-697-1427 fax


> 
> If catenation is used frequently one should use a suitable data type
> which supports constant catenation. Chris Okasaki's tutorial on
> functional data types
> 
> 
>http://foxnet.cs.cmu.edu/afs/cs.cmu.edu/project/fox/mosaic/people/cokasaki/papers.html#ssafp96
> 
> contains a collection of sequence data types (the paper is _really_ worth
> reading).
> 
> > You recursively partition the list based on the first character not
> > already processed.  So if you have a list like:
> > 
> > [alex,fergus,robert,allen,frank,ralf]
> > 
> > At the next stage you have:
> > [(a,[alex,allen]),(f,[fergus,frank]),(r,[robert,ralf])]
> > 
> > It takes at most m passes until each node has only one leaf
> > [(a,[(al,[(ale,[alex]),(all,[allen])])]),(f,[(fe,[fergus]),(fr,[frank])]),
> >  (r,[(ra,[ralf]),(ro,[robert])])]
> > 
> > There are many details to getting this right, but this is the general
> > idea.
> > 
> > As I write this, I realize I have no idea how to think about this in
> > Haskell.  Does the Haskell type system support this?  How do I describe
> > this tree data structure and maintain a list of leaves accross the
> > datastructure?  
> 
> The data structure you request is a trie or a suffix tree (the Boolean
> flag indicates whether the empty list is contained or not).
> 
> > data Trie a         =  Leaf [a]                             -- leaf node
> >                     |  Node [(a, Trie a)] Bool              -- inner node
> 
> The sorting algorithm is relatively easy to formulate if the elements
> to be sorted are lists themselves (recall that a string is a list of
> characters). To construct a level in the tree one could use bucket sort.
> [The empty list must be considered separately.]
> 
> > import Array
> >
> > bucketSort bs x             =  [ b | b@(_, y) <- assocs bins, not (null y) ]
> >     where bins              =  accumList bs [ (a, as) | (a : as) <- x ]
> >
> > accumList                   :: (Ix a) => (a, a) -> [(a, b)] -> Array a [b]
> > accumList                   =  accumArray (flip (:)) []
> 
> > example                     =  ["alex", "fergus", "robert", "allen", "frank" 
>,"ralf"]
> 
> The expression
> 
>       bucketSort ('a','z') example
> 
> yields
> 
> [('a', ["llen", "lex"]), ('f', ["rank", "ergus"]), ('r', ["alf", "obert"])]
> 
> The drawback is that the size of the `alphabet' contributes to the
> running time. A few pointers are probably helpful: Robert Giegerich and
> Stefan Kurtz have done some work on suffix tree constructions, see
> 
> http://www.techfak.uni-bielefeld.de/techfak/persons/kurtz/publications.html
> 
> HTH, Ralf
> 



Reply via email to