Andrew Coppin wrote: > apfelmus wrote: > >Note that the one usually adds an "end of string" character $ in the > >Burrows-Wheeler transform for compression such that sorting rotated > >strings becomes sorting suffices. > > > > Yeah, I noticed that the output from by program can never actually be > reverted to its original form. ;-) Maybe it I alter the code to stick a > 0-byte in there or something...
To recover the original string you either need to store the offset to the first character or add a sentinel character to mark the string end. One advantage of the sentinel character approach is that it's sufficient to sort just the suffixes of the text. A disadvantage is that you need an additional character. The code below reserves \0 for this purpose. >>> Code: >>> "bwt" implements a variation of the Burrows-Wheeler transform, using \0 as a sentinel character for simplicity. The sentinel has to be smaller than all other characters in the string. > bwt xs = let > suffixes = [(a,as) | a:as <- tails ('\0':xs)] > in > map fst . sortBy (\(_,a) (_,b) -> a `compare` b) $ suffixes "rbwt" implements the corresponding inverse BWT. It's a fun knot tying exercise. > rbwt xs = let > res = sortBy (\(a:as) (b:bs) -> a `compare` b) (zipWith' (:) xs res) > in > tail . map snd . zip xs $ head res "zipWith'" is a variant of zipWith that asserts that the third argument has the same shape as the second one. > zipWith' f [] _ = [] > zipWith' f (x:xs) ~(y:ys) = f x y : zipWith' f xs ys <<< End Code <<< Both transforms use linear memory (but quite a lot, admittedly). regards, Bertram _______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe