> If you read the source code, length do not read the data, that's why > it is so fast. It cannot be done for UTF-8 strings.
I think at this point most the amazement is directed at Data.Text being slower than good old [Char] (at least for this operation - we should probably expand our view to more than one operation). > Hey, normal string way faster than GNU wc! No - you need to perform a fair comparison. Try "wc -c" to only count characters (not lines and words too). I'd provide numbers but my wc doesn't seem to support UTF-8 and not sure what package contains a unicode aware wc. > readChar :: L.ByteString -> Maybe Int64 > readChar bs = do (c,_) <- L.uncons bs > return (choose (fromEnum c)) > where > choose :: Int -> Int64 > choose c > | c < 0xc0 = 1 > | c < 0xe0 = 2 > | c < 0xf0 = 3 > | c < 0xf8 = 4 > | otherwise = 1 > > inspired by Data.ByteString.Lazy.UTF8, same performances as GNU wc (it > is cheating because it do not check the validity of the multibyte char). Ah, interesting and a worth-while cheat. Thomas _______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe