Gleb Alexeyev wrote: > Bryan O'Sullivan wrote: >> I just posted a library named suffixtree to Hackage. >> >> http://www.serpentine.com/software/suffixtree/ >> >> It implements Giegerich and Kurtz's lazy construction algorithm, with >> a few tweaks for better performance and resource usage. >> >> API docs: >> >> http://darcs.serpentine.com/suffixtree/dist/doc/html/Data-SuffixTree.html >> >> I've tested it on multi-megabyte input strings. >> > I think I found a bug: > import qualified Data.SuffixTree as T > >> T.countRepeats "ab" (T.construct "abab") > 1
That is almost certainly because the algorithm expects the source string to have a unique character at its end. The library should either make that clear or add such a character on its own. Otherwise the "ab" suffix is a prefix of the "abab" suffix and the shorter one gets lost. If you end in "$" then "ab$" cannot merge with "abab$" and there are no distinct suffixes a and b for which (isPrefixOf a b) is true. Example: > *Data.SuffixTree> countRepeats "ab" (construct "abab") > 1 > *Data.SuffixTree> countRepeats "ab" (construct "abab$") > 2 -- Chris _______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe