Re: [Haskell-cafe] Re: Need for speed: the Burrows-Wheeler Transform

Chris Kuklewicz Sat, 23 Jun 2007 19:56:23 -0700

Felipe Almeida Lessa wrote:

On 6/23/07, Bertram Felgenhauer <[EMAIL PROTECTED]> wrote:

"rbwt" implements the corresponding inverse BWT. It's a fun knot tying
exercise.
> rbwt xs = let
> res = sortBy (\(a:as) (b:bs) -> a `compare` b) (zipWith' (:) xsres)
>   in
>     tail . map snd . zip xs $ head res


Indeed it's very fun =). Although I must admit that, even after
staring at the code for some time, I still don't see how it works. Is
it too much to ask for a brief explanation?

I see that zipWith' creates all the lazy list without requiring any
knowledge of the second list, which means that the list to be sorted
is created nicely. But when sortBy needs to compare, say, the first
with the second items, it forces the thunk of the lazy list. But that
thunk depends on the sorted list itself!

What am I missing? ;)

Thanks in advance,


I just read and understood it.  I'll walk through my version:

import Data.List

f `on` g = \x y -> (g x) `f` (g y)

zipWith' f []     _       = []
zipWith' f (x:xs) ~(y:ys) = f x y : zipWith' f xs ys

bwt = map head . sortBy (compare `on` tail) . init . tails . ('\0':)

rbwt xs = tail $ zipWith' (flip const) xs $ head res
  where res = sortBy (compare `on` head) (zipWith' (:) xs res)


The key is that sort and sortBy are very very lazy.

'res' is a finite list of infinite strings.

Consider sorting such a finite list of infinite lists without
the recursive definition:

example =
  let a = [1,3..]
      b = [1,2..]
      c = [2,4..]
      d = [1,4..]
      sorted = sort [a,b,c,d]
      firstFiveOfEach = map (take 5) sorted
  in mapM_ print firstFiveOfEach


Which in ghci produces:

*Main> example
[1,2,3,4,5]
[1,3,5,7,9]
[1,4,7,10,13]
[2,4,6,8,10]


Now for a sidetrack:

example =
  let a = [1,3..]
      b = [1,2..]
      c = [2,4..]
      d = [1,4..]
      sorted = sort [a,b,c,d,c]
      firstFiveOfEach = map (take 5) sorted
  in mapM_ print firstFiveOfEach


Produces the first three elements before 'c' ....

*Main> example
[1,2,3,4,5]
[1,3,5,7,9]
[1,4,7,10,13]


...and then hangs because "compare c c" never terminates.

This lazy sorting is also why "head . sort $ foo" is an O(N) way to get theminimum of a list "foo" in Haskell.

So the code for rbwt depends on this lazy sorting behavior. The recursivenature of the definition is the difficult to (directly) invent part.

The result of (zipWith' (:) xs ___) is a list the same length as 'xs' and whereeach item is a list with the head item taken from 'xs'. So it has "primed thepump" of the recursive definition by giving the first Char in each of the strings.


Compare with the much simpler 'ones' and 'odds':

ones = 1 : ones
odds = 1 : map (2+) odds

What about the tails of all these lists? They come from res. And res comesfrom sorting the results of zipWith'. But the sorting is done only by lookingat the head element, which is the one that comes from xs. That is the "sortBy(\(a:as) (b:bs) -> a `compare` b)" or "sortBy (compare `on` head)".

So the sorting routine does not care about the ___ in (zipWith' (:) xs ___) atall, since it only examines the head which comes from xs.

_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: [Haskell-cafe] Re: Need for speed: the Burrows-Wheeler Transform

Reply via email to