Re: [Haskell-cafe] ByteString search code available in easy-to-digest form

2007-11-07 Thread Don Stewart
bos:
 I've packaged up the fast Boyer-Moore and Knuth-Morris-Pratt code that 
 Chris Kuklewicz posted a few months ago:
 
   http://article.gmane.org/gmane.comp.lang.haskell.libraries/7363
 
 The consensus at the time was that the code was not ready for rolling 
 into the bytestring package, but now it's easy to install and start 
 working with.
 
 API docs:
 
   http://darcs.serpentine.com/stringsearch/dist/doc/html/stringsearch/
 
 Patches against the darcs repo welcome:
 
   darcs get http://darcs.serpentine.com/stringsearch
 
 Credit to Justin Bailey, Daniel Fischer, and Chris Kuklewicz for their 
 hard work.
 
 (Currently only tested against GHC 6.6.1, FYI.)

Do we have any benchmarks, for say, 1G files, versus linear, naive
(strict) search?

-- Don
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] ByteString search code available in easy-to-digest form

2007-11-07 Thread Bryan O'Sullivan

Don Stewart wrote:


Do we have any benchmarks, for say, 1G files, versus linear, naive
(strict) search?


Chris mentioned that he did, but I haven't had time to write anything 
benchmarky yet.


b
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] ByteString search code available in easy-to-digest form

2007-11-07 Thread Justin Bailey
On Nov 7, 2007 2:21 PM, Bryan O'Sullivan [EMAIL PROTECTED] wrote:
 Chris mentioned that he did, but I haven't had time to write anything
 benchmarky yet.

I used the attached program to benchmark the various functions against
endo.dna[1], a 7 MB file that came with this year's ICFP contest. It
appends a pattern that occurs nowhere in the file to the end of that
file and then searches for it. Strict and lazy bytestring searches
using KMP are performed, plus a search using the existing bytestring
searches and using a List.

You'll have to change the import from Data.ByteString.KMP for it to
compile but otherwise it should work out of the box..

Justin

[1] http://www.icfpcontest.org/endo.zip
module Main

where

import qualified Data.ByteString.Lazy as L
import qualified Data.ByteString as S
import qualified Data.ByteString.KMP as K
import Data.List (isInfixOf)

main =
  do
testStr - readFile endo.dna = \s - return $ s ++ searchStr
lazyTestStr - L.readFile endo.dna = \s - return $ L.append s lazySearchStr
strictTestStr - S.readFile endo.dna = \s - return $ S.append s strictSearchStr
putStrLn $ ((kmpMatchLL):  ++ show ({-# SCC kmpMatchLL #-} K.kmpMatchLL lazySearchStr lazyTestStr))
putStrLn $ ((kmpMatchSS):  ++ show ({-# SCC main_kmpMatchSS #-} K.kmpMatchSS strictSearchStr strictTestStr))
putStrLn $ ((strict):  ++ show ({-# SCC main_findStrict #-} S.findSubstring strictSearchStr strictTestStr))
putStrLn $ ((naive):  ++ show ({-# SCC main_findSubstringLazy #-} findSubstringLazy lazySearchStr lazyTestStr))
putStrLn $ ((list) found:  ++ show ({-# SCC main_findList #-} searchStr `isInfixOf` testStr))
putStrLn Done!


searchStr = IFPIFPIFPIFPIFPIFPIFPIFP
lazySearchStr = toLazyBS searchStr
strictSearchStr = toStrictBS searchStr

toLazyBS = L.pack . map (toEnum . fromEnum) 
toStrictBS = S.pack . map (toEnum . fromEnum)

findSubstringLazy :: L.ByteString - L.ByteString - Maybe Int
findSubstringLazy !test !big = go big 0
where
go !s !n | test `L.isPrefixOf` s = Just n
 | L.null s  = Nothing
 | otherwise = go (L.tail s) (n+1)

  
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe