On 13/11/2009 07:04, Bryan O'Sullivan wrote:
I'm working on measuring and improving the performance of the text
library at the moment, and the very first test I tried demonstrated a
piece of behaviour that I'm not completely able to understand. Actually,
I'm not able to understand what's going on at all, beyond a very shallow
level. All the comments below pertain to GHC 6.10.4.
The text library uses stream fusion, and I want to measure the
performance of UTF-8 decoding.
The code I'm measuring is very simple:
import qualified Data.ByteString as B
import Data.Text.Encoding as T
import qualified Data.Text as T
import System.Environment (getArgs)
import Control.Monad (forM_)
main = do
args <- getArgs
forM_ args $ \a -> do
s <- B.readFile a
let t = T.decodeUtf8 s
print (T.length t)
The streamUtf8 function looks roughly like this:
streamUtf8 :: OnDecodeError -> ByteString -> Stream Char
streamUtf8 onErr bs = Stream next 0 (maxSize l)
where
l = B.length bs
next i
| i >= l = Done
| U8.validate1 x1 = Yield (unsafeChr8 x1) (i+1)
| {- etc. -}
{-# INLINE [0] streamUtf8 #-}
The values being Yielded from the inner function are, as you can see,
themselves constructed by functions.
Originally, with the inner next function manually marked as INLINE, I
found that functions like unsafeChr8 were not being inlined by GHC, and
performance was terrible due to the amount of boxing and unboxing
happening in the inner loop.
>
I somehow stumbled on the idea of removing the INLINE annotation from
next, and performance suddenly improved by a significant integer
multiple. This caused the body of streamUtf8 to be inlined into my test
program, as I hoped.
I think I can explain this one, at least partially. When you mark a
function INLINE, GHC does not optimise the body of the function itself,
on the grounds that it will be inlined at the call site anyway and get
optimised there.
Simon just changed this behaviour in GHC 6.12, so that GHC now keeps the
original definition for inlining, but also optimises the original
function as normal, which is useful if we can't or don't want to inline
at a call site for some reason (or indeed if the calling module is being
compiled without -O!).
What we still don't understand, however, is why streamUtf8 was not being
inlined at the call site. That is the root of the problem. We'll need
to look more closely at the call site to understand what's going on.
Cheers,
Simon
(PS if any of what I said contradicts what Simon and/or Roman said, then
please ignore me and not them :-)
_______________________________________________
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users