Re: [Haskell-cafe] ByteString search code available in easy-to-digest form
bos: I've packaged up the fast Boyer-Moore and Knuth-Morris-Pratt code that Chris Kuklewicz posted a few months ago: http://article.gmane.org/gmane.comp.lang.haskell.libraries/7363 The consensus at the time was that the code was not ready for rolling into the bytestring package, but now it's easy to install and start working with. API docs: http://darcs.serpentine.com/stringsearch/dist/doc/html/stringsearch/ Patches against the darcs repo welcome: darcs get http://darcs.serpentine.com/stringsearch Credit to Justin Bailey, Daniel Fischer, and Chris Kuklewicz for their hard work. (Currently only tested against GHC 6.6.1, FYI.) Do we have any benchmarks, for say, 1G files, versus linear, naive (strict) search? -- Don ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] ByteString search code available in easy-to-digest form
Don Stewart wrote: Do we have any benchmarks, for say, 1G files, versus linear, naive (strict) search? Chris mentioned that he did, but I haven't had time to write anything benchmarky yet. b ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] ByteString search code available in easy-to-digest form
On Nov 7, 2007 2:21 PM, Bryan O'Sullivan [EMAIL PROTECTED] wrote: Chris mentioned that he did, but I haven't had time to write anything benchmarky yet. I used the attached program to benchmark the various functions against endo.dna[1], a 7 MB file that came with this year's ICFP contest. It appends a pattern that occurs nowhere in the file to the end of that file and then searches for it. Strict and lazy bytestring searches using KMP are performed, plus a search using the existing bytestring searches and using a List. You'll have to change the import from Data.ByteString.KMP for it to compile but otherwise it should work out of the box.. Justin [1] http://www.icfpcontest.org/endo.zip module Main where import qualified Data.ByteString.Lazy as L import qualified Data.ByteString as S import qualified Data.ByteString.KMP as K import Data.List (isInfixOf) main = do testStr - readFile endo.dna = \s - return $ s ++ searchStr lazyTestStr - L.readFile endo.dna = \s - return $ L.append s lazySearchStr strictTestStr - S.readFile endo.dna = \s - return $ S.append s strictSearchStr putStrLn $ ((kmpMatchLL): ++ show ({-# SCC kmpMatchLL #-} K.kmpMatchLL lazySearchStr lazyTestStr)) putStrLn $ ((kmpMatchSS): ++ show ({-# SCC main_kmpMatchSS #-} K.kmpMatchSS strictSearchStr strictTestStr)) putStrLn $ ((strict): ++ show ({-# SCC main_findStrict #-} S.findSubstring strictSearchStr strictTestStr)) putStrLn $ ((naive): ++ show ({-# SCC main_findSubstringLazy #-} findSubstringLazy lazySearchStr lazyTestStr)) putStrLn $ ((list) found: ++ show ({-# SCC main_findList #-} searchStr `isInfixOf` testStr)) putStrLn Done! searchStr = IFPIFPIFPIFPIFPIFPIFPIFP lazySearchStr = toLazyBS searchStr strictSearchStr = toStrictBS searchStr toLazyBS = L.pack . map (toEnum . fromEnum) toStrictBS = S.pack . map (toEnum . fromEnum) findSubstringLazy :: L.ByteString - L.ByteString - Maybe Int findSubstringLazy !test !big = go big 0 where go !s !n | test `L.isPrefixOf` s = Just n | L.null s = Nothing | otherwise = go (L.tail s) (n+1) ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe