Yup, that's it. It does indeed go nice and fast with GHC 5.04.3, but slows down with 6.2. This is because 6.2 is using the new Data.PackedString library, which is unoptimised. (the code also needs a fix to hashPS for 6.2: indexPS is 0-based rather than 1-based). Cheers, Simon
> Does the code below look familiar? > > /Josef > > \begin{code} > -- compile with: ghc -O -package lang > -- run with: ./a.out +RTS -H10m -K4m <Input.txt > > import MArray > import PackedString > import IOExts > import IO > import Char > import Monad > > arr_size = 20000 > > main = do > h <- openFile "Usr.Dict.Words" ReadMode > sz <- hFileSize h > ps <- hGetPS h (fromIntegral sz) > > tbl <- newArray (0,arr_size) [] > mapM (addToHashTable tbl) (linesPS ps) > > h <- openFile "Input.txt" ReadMode > sz <- hFileSize h > ps <- hGetPS h (fromIntegral sz) > > let test s = do b <- elemHashTable s tbl > when (not b) (putStrLn (unpackPS s)) > mapM test (linesPS ps) > > type HashTable = IOArray Int [PackedString] > > -- Looks bad, but GHC does a great job of optimising it: > hashPS :: PackedString -> Int > hashPS ps = foldr f 0 (map (ord.indexPS ps) [1..lengthPS ps]) > where f n m = n + m * 128 `mod` 1048583 > > addToHashTable :: HashTable -> PackedString -> IO () > addToHashTable tbl s = do > let h = hashPS s > index = h `mod` arr_size > r <- readArray tbl index > if (s `elem` r) then return () else do > writeArray tbl index (s : r) > > elemHashTable :: PackedString -> HashTable -> IO Bool > elemHashTable s tbl = do > let h = hashPS s > index = h `mod` arr_size > r <- readArray tbl index > return (s `elem` r) > \end{code} > _______________________________________________ > Glasgow-haskell-users mailing list > [EMAIL PROTECTED] > http://www.haskell.org/mailman/listinfo/glasgow-haskell-users > _______________________________________________ Glasgow-haskell-users mailing list [EMAIL PROTECTED] http://www.haskell.org/mailman/listinfo/glasgow-haskell-users