> > > o it uses (++) to catenate the results of the recursive calls (note that
> > >   (++) takes time proportional to the length of its first argument).
> 
> This also seems wierd.  Concatenation is a frequent enough operation on
> lists.  Give such frequency, shouldn't the internal representation of a
> list include a pointer to its last element just to ensure that
> concatenation is constant time?  I realize that you can't point to the end
> of a lazy list, but concatenation is not relevant to lazy lists.

Haskell is a lazy language; a list is usually an expression which
gradually evaluates to a list if forced. Hence your solution is not as
easy to implement as it seems (consider the list [1 ..]). There is
another point: your solution strongly suggests that a destructive
update is used to catenate the two lists which would violate the
principle of referential transparency (consider the expression as ++
as).

If catenation is used frequently one should use a suitable data type
which supports constant catenation. Chris Okasaki's tutorial on
functional data types

http://foxnet.cs.cmu.edu/afs/cs.cmu.edu/project/fox/mosaic/people/cokasaki/papers.html#ssafp96

contains a collection of sequence data types (the paper is _really_ worth
reading).

> You recursively partition the list based on the first character not
> already processed.  So if you have a list like:
> 
> [alex,fergus,robert,allen,frank,ralf]
> 
> At the next stage you have:
> [(a,[alex,allen]),(f,[fergus,frank]),(r,[robert,ralf])]
> 
> It takes at most m passes until each node has only one leaf
> [(a,[(al,[(ale,[alex]),(all,[allen])])]),(f,[(fe,[fergus]),(fr,[frank])]),
>  (r,[(ra,[ralf]),(ro,[robert])])]
> 
> There are many details to getting this right, but this is the general
> idea.
> 
> As I write this, I realize I have no idea how to think about this in
> Haskell.  Does the Haskell type system support this?  How do I describe
> this tree data structure and maintain a list of leaves accross the
> datastructure?  

The data structure you request is a trie or a suffix tree (the Boolean
flag indicates whether the empty list is contained or not).

> data Trie a           =  Leaf [a]                             -- leaf node
>                       |  Node [(a, Trie a)] Bool              -- inner node

The sorting algorithm is relatively easy to formulate if the elements
to be sorted are lists themselves (recall that a string is a list of
characters). To construct a level in the tree one could use bucket sort.
[The empty list must be considered separately.]

> import Array
>
> bucketSort bs x               =  [ b | b@(_, y) <- assocs bins, not (null y) ]
>     where bins                =  accumList bs [ (a, as) | (a : as) <- x ]
>
> accumList                     :: (Ix a) => (a, a) -> [(a, b)] -> Array a [b]
> accumList                     =  accumArray (flip (:)) []

> example                       =  ["alex", "fergus", "robert", "allen", "frank" 
>,"ralf"]

The expression

        bucketSort ('a','z') example

yields

[('a', ["llen", "lex"]), ('f', ["rank", "ergus"]), ('r', ["alf", "obert"])]

The drawback is that the size of the `alphabet' contributes to the
running time. A few pointers are probably helpful: Robert Giegerich and
Stefan Kurtz have done some work on suffix tree constructions, see

http://www.techfak.uni-bielefeld.de/techfak/persons/kurtz/publications.html

HTH, Ralf


Reply via email to