> > > o it uses (++) to catenate the results of the recursive calls (note that
> > > (++) takes time proportional to the length of its first argument).
>
> This also seems wierd. Concatenation is a frequent enough operation on
> lists. Give such frequency, shouldn't the internal representation of a
> list include a pointer to its last element just to ensure that
> concatenation is constant time? I realize that you can't point to the end
> of a lazy list, but concatenation is not relevant to lazy lists.
Haskell is a lazy language; a list is usually an expression which
gradually evaluates to a list if forced. Hence your solution is not as
easy to implement as it seems (consider the list [1 ..]). There is
another point: your solution strongly suggests that a destructive
update is used to catenate the two lists which would violate the
principle of referential transparency (consider the expression as ++
as).
If catenation is used frequently one should use a suitable data type
which supports constant catenation. Chris Okasaki's tutorial on
functional data types
http://foxnet.cs.cmu.edu/afs/cs.cmu.edu/project/fox/mosaic/people/cokasaki/papers.html#ssafp96
contains a collection of sequence data types (the paper is _really_ worth
reading).
> You recursively partition the list based on the first character not
> already processed. So if you have a list like:
>
> [alex,fergus,robert,allen,frank,ralf]
>
> At the next stage you have:
> [(a,[alex,allen]),(f,[fergus,frank]),(r,[robert,ralf])]
>
> It takes at most m passes until each node has only one leaf
> [(a,[(al,[(ale,[alex]),(all,[allen])])]),(f,[(fe,[fergus]),(fr,[frank])]),
> (r,[(ra,[ralf]),(ro,[robert])])]
>
> There are many details to getting this right, but this is the general
> idea.
>
> As I write this, I realize I have no idea how to think about this in
> Haskell. Does the Haskell type system support this? How do I describe
> this tree data structure and maintain a list of leaves accross the
> datastructure?
The data structure you request is a trie or a suffix tree (the Boolean
flag indicates whether the empty list is contained or not).
> data Trie a = Leaf [a] -- leaf node
> | Node [(a, Trie a)] Bool -- inner node
The sorting algorithm is relatively easy to formulate if the elements
to be sorted are lists themselves (recall that a string is a list of
characters). To construct a level in the tree one could use bucket sort.
[The empty list must be considered separately.]
> import Array
>
> bucketSort bs x = [ b | b@(_, y) <- assocs bins, not (null y) ]
> where bins = accumList bs [ (a, as) | (a : as) <- x ]
>
> accumList :: (Ix a) => (a, a) -> [(a, b)] -> Array a [b]
> accumList = accumArray (flip (:)) []
> example = ["alex", "fergus", "robert", "allen", "frank"
>,"ralf"]
The expression
bucketSort ('a','z') example
yields
[('a', ["llen", "lex"]), ('f', ["rank", "ergus"]), ('r', ["alf", "obert"])]
The drawback is that the size of the `alphabet' contributes to the
running time. A few pointers are probably helpful: Robert Giegerich and
Stefan Kurtz have done some work on suffix tree constructions, see
http://www.techfak.uni-bielefeld.de/techfak/persons/kurtz/publications.html
HTH, Ralf