Felipe Almeida Lessa wrote:
On 6/23/07, Bertram Felgenhauer <[EMAIL PROTECTED]> wrote:
"rbwt" implements the corresponding inverse BWT. It's a fun knot tying
exercise.
> rbwt xs = let
> res = sortBy (\(a:as) (b:bs) -> a `compare` b) (zipWith' (:) xs
res)
> in
> tail . map snd . zip xs $ head res
Indeed it's very fun =). Although I must admit that, even after
staring at the code for some time, I still don't see how it works. Is
it too much to ask for a brief explanation?
I see that zipWith' creates all the lazy list without requiring any
knowledge of the second list, which means that the list to be sorted
is created nicely. But when sortBy needs to compare, say, the first
with the second items, it forces the thunk of the lazy list. But that
thunk depends on the sorted list itself!
What am I missing? ;)
Thanks in advance,
I just read and understood it. I'll walk through my version:
import Data.List
f `on` g = \x y -> (g x) `f` (g y)
zipWith' f [] _ = []
zipWith' f (x:xs) ~(y:ys) = f x y : zipWith' f xs ys
bwt = map head . sortBy (compare `on` tail) . init . tails . ('\0':)
rbwt xs = tail $ zipWith' (flip const) xs $ head res
where res = sortBy (compare `on` head) (zipWith' (:) xs res)
The key is that sort and sortBy are very very lazy.
'res' is a finite list of infinite strings.
Consider sorting such a finite list of infinite lists without
the recursive definition:
example =
let a = [1,3..]
b = [1,2..]
c = [2,4..]
d = [1,4..]
sorted = sort [a,b,c,d]
firstFiveOfEach = map (take 5) sorted
in mapM_ print firstFiveOfEach
Which in ghci produces:
*Main> example
[1,2,3,4,5]
[1,3,5,7,9]
[1,4,7,10,13]
[2,4,6,8,10]
Now for a sidetrack:
example =
let a = [1,3..]
b = [1,2..]
c = [2,4..]
d = [1,4..]
sorted = sort [a,b,c,d,c]
firstFiveOfEach = map (take 5) sorted
in mapM_ print firstFiveOfEach
Produces the first three elements before 'c' ....
*Main> example
[1,2,3,4,5]
[1,3,5,7,9]
[1,4,7,10,13]
...and then hangs because "compare c c" never terminates.
This lazy sorting is also why "head . sort $ foo" is an O(N) way to get the
minimum of a list "foo" in Haskell.
So the code for rbwt depends on this lazy sorting behavior. The recursive
nature of the definition is the difficult to (directly) invent part.
The result of (zipWith' (:) xs ___) is a list the same length as 'xs' and where
each item is a list with the head item taken from 'xs'. So it has "primed the
pump" of the recursive definition by giving the first Char in each of the strings.
Compare with the much simpler 'ones' and 'odds':
ones = 1 : ones
odds = 1 : map (2+) odds
What about the tails of all these lists? They come from res. And res comes
from sorting the results of zipWith'. But the sorting is done only by looking
at the head element, which is the one that comes from xs. That is the "sortBy
(\(a:as) (b:bs) -> a `compare` b)" or "sortBy (compare `on` head)".
So the sorting routine does not care about the ___ in (zipWith' (:) xs ___) at
all, since it only examines the head which comes from xs.
_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe