Felipe Almeida Lessa wrote:
On 6/23/07, Bertram Felgenhauer <[EMAIL PROTECTED]> wrote:
"rbwt" implements the corresponding inverse BWT. It's a fun knot tying
exercise.

> rbwt xs = let
> res = sortBy (\(a:as) (b:bs) -> a `compare` b) (zipWith' (:) xs res)
>   in
>     tail . map snd . zip xs $ head res

Indeed it's very fun =). Although I must admit that, even after
staring at the code for some time, I still don't see how it works. Is
it too much to ask for a brief explanation?

I see that zipWith' creates all the lazy list without requiring any
knowledge of the second list, which means that the list to be sorted
is created nicely. But when sortBy needs to compare, say, the first
with the second items, it forces the thunk of the lazy list. But that
thunk depends on the sorted list itself!

What am I missing? ;)

Thanks in advance,


I just read and understood it.  I'll walk through my version:

import Data.List

f `on` g = \x y -> (g x) `f` (g y)

zipWith' f []     _       = []
zipWith' f (x:xs) ~(y:ys) = f x y : zipWith' f xs ys

bwt = map head . sortBy (compare `on` tail) . init . tails . ('\0':)

rbwt xs = tail $ zipWith' (flip const) xs $ head res
  where res = sortBy (compare `on` head) (zipWith' (:) xs res)

The key is that sort and sortBy are very very lazy.

'res' is a finite list of infinite strings.

Consider sorting such a finite list of infinite lists without
the recursive definition:

example =
  let a = [1,3..]
      b = [1,2..]
      c = [2,4..]
      d = [1,4..]
      sorted = sort [a,b,c,d]
      firstFiveOfEach = map (take 5) sorted
  in mapM_ print firstFiveOfEach

Which in ghci produces:

*Main> example
[1,2,3,4,5]
[1,3,5,7,9]
[1,4,7,10,13]
[2,4,6,8,10]

Now for a sidetrack:

example =
  let a = [1,3..]
      b = [1,2..]
      c = [2,4..]
      d = [1,4..]
      sorted = sort [a,b,c,d,c]
      firstFiveOfEach = map (take 5) sorted
  in mapM_ print firstFiveOfEach

Produces the first three elements before 'c' ....

*Main> example
[1,2,3,4,5]
[1,3,5,7,9]
[1,4,7,10,13]

...and then hangs because "compare c c" never terminates.

This lazy sorting is also why "head . sort $ foo" is an O(N) way to get the minimum of a list "foo" in Haskell.

So the code for rbwt depends on this lazy sorting behavior. The recursive nature of the definition is the difficult to (directly) invent part.

The result of (zipWith' (:) xs ___) is a list the same length as 'xs' and where each item is a list with the head item taken from 'xs'. So it has "primed the pump" of the recursive definition by giving the first Char in each of the strings.

Compare with the much simpler 'ones' and 'odds':

ones = 1 : ones
odds = 1 : map (2+) odds

What about the tails of all these lists? They come from res. And res comes from sorting the results of zipWith'. But the sorting is done only by looking at the head element, which is the one that comes from xs. That is the "sortBy (\(a:as) (b:bs) -> a `compare` b)" or "sortBy (compare `on` head)".

So the sorting routine does not care about the ___ in (zipWith' (:) xs ___) at all, since it only examines the head which comes from xs.
_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Reply via email to