Using the code developed for ByteStrings by myself, Christ Kuklewicz
and Daniel Fischer, I've implemented Knuth-Morris-Pratt substring
searching on Data.Sequence "Seq" values. Attached you'll find the
library in kmp.zip.safe. The algorithm is implemented in the module
Data.Sequence.KMP.

At the root, SpeedTest.hs can be compiled on Windows with the
"prof_compile.bat" file included (you'll need to install the
"regex-dfa" and "regex-base" packages from
Hackage to build). SpeedTest searches for a known value in the 7 MB
file "endo.dna" (which can be downloaded from
http://www.icfpcontest.org/endo.zip) using several different
algorithms and methods: strict and lazy bytestrings, regular
expressions, and the KMP algorithm for "Seq" values.

SpeedTest is pretty fast but I worry about its space usage. It may
just be the nature of Seq values, but I cannot get it to run in
constant space when I think it should be. I'm especially interested in
help here.

All comments and feedback are welcome. Since the zip file includes a
darcs context file, feel free to send patches.

Justin

Attachment: kmp.zip.safe
Description: Binary data

_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Reply via email to