Hi,

Toby Hutton wrote:
Say I want to put the words 'foo', 'bar' and 'baz' into a binary tree.  The
heuristic requires I split the words into letters and sort them:
'aabbfoorz'.  The heuristic then may decide, based on the sorted letters,
that 'bar' and 'foo' should go in the left child and 'baz' goes in the
right. Typically we'd then simply recurse and for example, the left child's
words would be re-sorted into 'abfoor' and the heuristic is reapplied.

If we assume that sorting is relatively expensive, we can avoid the re-sort
for the children by unmerging the parent's sorted list of letters.  Two
sublists of a sorted list should already be sorted.  If we know which word
each letter belongs to it would be more efficient to tag the letters with
'left' or 'right' as the words are classified. Then we can iterate down the sorted letter list and produce new sorted sublists rather simply.

So it's not actually that complicated, and I can imagine exactly how it
could be done in C but I really don't know how to approach this in Haskell.

What about just storing with each character a "reference" to the word it orignally comes from?

import Data.List (sortBy)

data Tagged = Tag String Char

tag :: Tagged -> String
tag (Tag x _) = x
compareTagged :: Tagged -> Tagged -> Ordering
(Tag _ x) `compareTagged` (Tag _ y) = x `compare` y
tagWord :: String -> [Tagged]
tagWord word = map (Tag word) word

unmerge :: [a] -> (a -> Bool) -> ([a], [a])
unmerge xs p = foldr f ([], []) xs where
  f x (ls, rs) | p x = (x:ls, rs)
               | otherwise = (ls, x:rs)
data Tree = Empty | Leaf String | Node Tree Tree

tree :: ([Tagged] -> String -> Bool) -> [String] -> Tree
tree heuristic words = tree' words sorted where

  -- we only need to sort once ...
  sorted = sortBy compareTagged . concat . map tagWord $ words
tree' [] _ = Empty
  tree' [word] _ = Leaf word
tree' words sorted = let predicate = heuristic sorted
     (leftWords, rightWords) = unmerge words predicate
     -- ... because we reuse the ordered list by unmerging it
     (leftSorted, rightSorted) = unmerge sorted (predicate . tag)
in Node (tree' leftWords leftSorted) (tree' rightWords rightSorted)

  Tillmann

_______________________________________________
Haskell-Cafe mailing list
[email protected]
http://www.haskell.org/mailman/listinfo/haskell-cafe

Reply via email to