I noticed something about ByteStream performance that I don't
understand.

I have a test text document:

   $ ls -sh test-text-file
   956K test-text-file


Running this program, using the Prelude's IO functions:

module Main where

main = do
   content <- readFile "test-text-file"
   let l = length . words $ content
   print l

I get:

   $ time ./a.out
   174372

   real    0m0.805s
   user    0m0.720s
   sys     0m0.008s


Running a version of the same thing using Data.ByteStream.Char8:

module Main where

import qualified Data.ByteString.Char8 as B

main = do
   content <- B.readFile "test-text-file"
   let l = length . B.words $ content
   print l

I see a time that is quite a bit slower:

   $ time ./a.out
   174372

   real    0m1.864s
   user    0m1.596s
   sys     0m0.012s


Changing it to incorporate similar code to the implementation of
B.words:

module Main where

import qualified Data.ByteString.Char8 as B
import Data.Char (isSpace)

main = do
   content <- B.readFile "test-text-file"
   let l = length $ filter (not . B.null) $ B.splitWith isSpace
   content
   print l

I see a similar time as with B.words:

   $ time ./a.out
   174372

   real    0m1.835s
   user    0m1.628s
   sys     0m0.012s


And then if I change this to use B.split ' ' instead of isSpace:

module Main where

import qualified Data.ByteString.Char8 as B

main = do
   content <- B.readFile "test-text-file"
   let l = length $ filter (not . B.null) $ B.split ' ' content
   print l

I get a time that's much more reasonable-looking, compared to the
original Prelude.words version:

   $ time ./a.out
   174313

   real    0m0.389s
   user    0m0.312s
   sys     0m0.004s


It seems like the B.splitWith isSpace code is really slow for some
reason. Anybody have any idea what's going on? The actual implementation
is using isSpaceWord8 which is a case statement looking for a pile of
different whitespace characters.


--
 .~.    Dino Morelli
 /V\    email: [EMAIL PROTECTED]
/( )\   irc: dino-
^^-^^   preferred distro: Debian GNU/Linux  http://www.debian.org
_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Reply via email to