On Tue, 2013-03-19 at 20:32 +0000, Don Stewart wrote: > Oh, I forgot the technique of inlining the lazy bytestring chunks, and > processing each chunk seperately. > > $ time ./fast > 4166680 > ./fast 1.25s user 0.07s system 99% cpu 1.325 total > > Essentially inline Lazy.foldlChunks and specializes is (the inliner should > really get that). > And now we have a nice unboxed inner loop, which llvm might spot: > > $ ghc -O2 -funbox-strict-fields fast.hs --make -fllvm > $ time ./fast > 4166680 > ./fast 1.07s user 0.06s system 98% cpu *1.146 total* > > So about 8x faster. Waiting for some non-lazy bytestring benchmarks... :)
You could try something like this using Conduit: {-# LANGUAGE BangPatterns #-} module Main (main) where import Data.Conduit import qualified Data.Conduit.List as L import qualified Data.Conduit.Binary as B import qualified Data.ByteString.Char8 as BS8 main :: IO () main = print =<< runResourceT ( B.sourceFile filename $$ L.fold (\(!a) (!b) -> a + BS8.count ' ' b) (0 :: Int)) where filename = ... Nicolas > > {-# LANGUAGE BangPatterns #-} > > import Data.ByteString.Internal > import Data.ByteString.Unsafe > import qualified Data.ByteString.Char8 as S > import qualified Data.ByteString.Lazy.Char8 as L > import qualified Data.ByteString.Lazy.Internal as L > import System.IO.Posix.MMap.Lazy > > main = do > f <- unsafeMMapFile "test.txt" > print . new 0 $ L.toChunks f > > new :: Int -> [ByteString] -> Int > new i [] = i > new i (x:xs) = new (add i x) xs > > -- jump into the fast path > {-# INLINE add #-} > add :: Int -> ByteString -> Int > add !i !s | S.null s = i > | isSpace' x = add (i+1) xs > | otherwise = add i xs > where T x xs = uncons s > > data T = T !Char ByteString > > uncons s = T (w2c (unsafeHead s)) (unsafeTail s) > > isSpace' c = c == '\n' || c == ' ' > {-# INLINE isSpace' #-} > > > > > On Tue, Mar 19, 2013 at 7:36 PM, Don Stewart <don...@gmail.com> wrote: > > > Just for fun. Here's some improvements. about 6x faster. > > I'd be interested to see what io-streams could do on this. > > > > Using a 250M test file. > > > > -- strict state monad and bang patterns on the uncons and accumulator > > argument: > > > > $ time ./A > > 4166680 > > ./A 8.42s user 0.57s system 99% cpu 9.037 total > > > > -- just write a loop > > > > $ time ./A > > 4166680 > > ./A 3.84s user 0.26s system 99% cpu 4.121 total > > > > -- switch to Int > > > > $ time ./A > > 4166680 > > ./A 1.89s user 0.23s system 99% cpu 2.134 total > > > > -- custom isSpace function > > > > $ time ./A > > 4166680 > > ./A 1.56s user 0.24s system 99% cpu 1.808 total > > > > -- mmap IO > > > > $ time ./A > > 4166680 > > ./A 1.54s user 0.09s system 99% cpu 1.636 total > > > > Here's the final program: > > > > > > {-# LANGUAGE BangPatterns #-} > > > > import qualified Data.ByteString as S > > import qualified Data.ByteString.Lazy.Char8 as L > > import System.IO.Posix.MMap.Lazy > > > > main = do > > f <- unsafeMMapFile "test.txt" > > print $ go 0 f > > where > > go :: Int -> L.ByteString -> Int > > go !a !s = case L.uncons s of > > Nothing -> a > > Just (x,xs) | isSpaceChar8 x -> go (a+1) xs > > | otherwise -> go a xs > > > > isSpaceChar8 c = c == '\n' || c == ' ' > > {-# INLINE isSpaceChar8 #-} > > > > > > On Mon, Mar 18, 2013 at 8:53 AM, Konstantin Litvinenko < > > to.darkan...@gmail.com> wrote: > > > >> Hi All! > >> > >> I tune my toy project for performance and hit the wall on simple, in > >> imperative world, task. Here is the code that model what I'm trying to > >> achieve > >> > >> import qualified Data.ByteString.Lazy as L > >> import Data.Word8(isSpace) > >> import Data.Word > >> import Control.Monad.State > >> > >> type Stream = State L.ByteString > >> > >> get_byte :: Stream (Maybe Word8) > >> get_byte = do > >> s <- get > >> case L.uncons s of > >> Nothing -> return Nothing > >> Just (x, xs) -> put xs >> return (Just x) > >> > >> main = do > >> f <- L.readFile "test.txt" > >> let r = evalState count_spaces f > >> print r > >> where > >> count_spaces = go 0 > >> where > >> go a = do > >> x <- get_byte > >> case x of > >> Just x' -> if isSpace x' then go (a + 1) else go a > >> Nothing -> return a > >> > >> It takes the file and count spaces, in imperative way, consuming bytes > >> one by one. The problem is: How to rewrite this to get rid of constant > >> allocation of state but still working with stream of bytes? I can rewrite > >> this as one-liner L.foldl, but that doesn't help me in any way to optimize > >> my toy project where all algorithms build upon consuming stream of bytes. > >> > >> PS. My main lang is C++ over 10 years and I only learn Haskell :) > >> > >> > >> ______________________________**_________________ > >> Haskell-Cafe mailing list > >> Haskell-Cafe@haskell.org > >> http://www.haskell.org/**mailman/listinfo/haskell-cafe<http://www.haskell.org/mailman/listinfo/haskell-cafe> > >> > > > > > _______________________________________________ > Haskell-Cafe mailing list > Haskell-Cafe@haskell.org > http://www.haskell.org/mailman/listinfo/haskell-cafe _______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe