Hi I've talked to John a bit, and discussed test cases etc. I've tracked this down a little way.
Given the attached file, compiling witih SHORT_EXPORT_LIST makes the code go _slower_. By exporting the "print_lines" function the code doubles in speed. This runs against everything I was expecting, and that Simon has described. Taking a look at the .hi files for the two alternatives, there are two differences: 1) In the faster .hi file, the body of print_lines is exported. This is reasonable and expected. 2) In the faster .hi file, there are additional specialisations, which seemingly have little/nothing to do with print_lines, but are omitted if it is not exported: "SPEC >>= [GHC.IOBase.IO]" ALWAYS forall @ el $dMonad :: GHC.Base.Monad GHC.IOBase.IO Sound.IterateeM.>>= @ GHC.IOBase.IO @ el $dMonad = Sound.IterateeM.a `cast` (forall el1 a b. Sound.IterateeM.IterateeGM el1 GHC.IOBase.IO a -> (a -> Sound.IterateeM.IterateeGM el1 GHC.IOBase.IO b) -> trans (sym ((GHC.IOBase.:CoIO) (Sound.IterateeM.IterateeG el1 GHC.IOBase.IO b))) (sym ((Sound.IterateeM.:CoIterateeGM) el1 GHC.IOBase.IO b))) @ el "SPEC Sound.IterateeM.$f2 [GHC.IOBase.IO]" ALWAYS forall @ el $dMonad :: GHC.Base.Monad GHC.IOBase.IO Sound.IterateeM.$f2 @ GHC.IOBase.IO @ el $dMonad = Sound.IterateeM.$s$f2 @ el "SPEC Sound.IterateeM.$f2 [GHC.IOBase.IO]" ALWAYS forall @ el $dMonad :: GHC.Base.Monad GHC.IOBase.IO Sound.IterateeM.$f2 @ GHC.IOBase.IO @ el $dMonad = Sound.IterateeM.$s$f21 @ el "SPEC Sound.IterateeM.liftI [GHC.IOBase.IO]" ALWAYS forall @ el @ a $dMonad :: GHC.Base.Monad GHC.IOBase.IO Sound.IterateeM.liftI @ GHC.IOBase.IO @ el @ a $dMonad = Sound.IterateeM.$sliftI @ el @ a "SPEC return [GHC.IOBase.IO]" ALWAYS forall @ el $dMonad :: GHC.Base.Monad GHC.IOBase.IO Sound.IterateeM.return @ GHC.IOBase.IO @ el $dMonad = Sound.IterateeM.a7 `cast` (forall el1 a. a -> trans (sym ((GHC.IOBase.:CoIO) (Sound.IterateeM.IterateeG el1 GHC.IOBase.IO a))) (sym ((Sound.IterateeM.:CoIterateeGM) el1 GHC.IOBase.IO a))) @ el My guess is that these cause the slowdown - but is there any reason that print_lines not being exported should cause them to be omitted? All these tests were run on GHC 6.10.1 with -O2. Thanks Neil On Fri, Nov 21, 2008 at 10:33 AM, Simon Peyton-Jones <[EMAIL PROTECTED]> wrote: > | This project is based on Oleg's Iteratee code; I started using his > | IterateeM.hs and Enumerator.hs files and added my own stuff to > | Enumerator.hs (thanks Oleg, great work as always). When I started > | cleaning up by moving my functions from Enumerator.hs to MyEnum.hs, my > | minimal test case increased from 19s to 43s. > | > | I've found two factors that contributed. When I was cleaning up, I > | also removed a bunch of unused functions from IterateeM.hs (some of > | the test functions and functions specific to his running example of > | HTTP encoding). When I added those functions back in, and added > | INLINE pragmas to the exported functions in MyEnum.hs, I got the > | performance back. > | > | In general I hadn't added export lists to the modules yet, so all > | functions should have been exported. > > I'm totally snowed under with backlog from my recent absence, so I can't look > at this myself, but if anyone else wants to I'd be happy to support with > advice and suggestions. > > In general, having an explicit export list is good for performance. I typed > an extra section in the GHC performance resource > http://haskell.org/haskellwiki/Performance/GHC to explain why. In general > that page is where we should document user advice for performance in GHC. > > I can't explain why *adding* unused functions would change performance though! > > Simon > > > _______________________________________________ > Glasgow-haskell-users mailing list > Glasgow-haskell-users@haskell.org > http://www.haskell.org/mailman/listinfo/glasgow-haskell-users >
{-# LANGUAGE CPP #-} {- This file was originally take from http://okmij.org/ftp/Haskell/Iteratee/, and modified to suit this project -} -- #define SHORT_EXPORT_LIST 1 module Sound.IterateeM ( StreamG (..), IterateeG (..), IterateeGM (..), liftI, (>>==), (==<<), stream2list, sbreak, sdropWhile, snext, speek, skip_till_eof, sdrop, stake, map_stream, EnumeratorGM, enum_eof, enum_err, (>.), enum_pure_1chunk, enum_pure_nchunk, enum_h, enum_file, #ifdef SHORT_EXPORT_LIST #else print_lines, #endif ) {- -- #ifdef SHORT_EXPORT_LIST -- #else -- Iteratee, -- IterateeM, -- Stream, -- Line, -- line, -- print_lines, -- enum_lines, -- enum_words, -- EnumeratorM, -- enum_chunk_decoded, -- #endif -- ) -- #else -- #ifdef FULL_EXPORT_LIST -- module Sound.IterateeM ( -- StreamG (..), -- IterateeG (..), -- IterateeGM (..), -- liftI, -- (>>==), -- (==<<), -- stream2list, -- sbreak, -- sdropWhile, -- snext, -- speek, -- skip_till_eof, -- sdrop, -- stake, -- map_stream, -- EnumeratorGM, -- enum_eof, -- enum_err, -- (>.), -- enum_pure_1chunk, -- enum_pure_nchunk, -- enum_h, -- enum_file, -- Iteratee, -- IterateeM, -- Stream, -- Line, -- line, -- print_lines, -- enum_lines, -- enum_words, -- EnumeratorM, -- enum_chunk_decoded -- ) -- #else -- module Sound.IterateeM -- #endif -- #endif -} where import Foreign.Marshal.Alloc import Foreign.Marshal.Array import Data.List (splitAt) import Data.Char (isHexDigit, digitToInt, isSpace) import Data.Word (Word8) import Control.Monad.Trans import Control.Monad.Identity import Control.OldException (try) import System.IO -- A stream is a (continuing) sequence of elements bundled in Chunks. -- The first two variants indicate termination of the stream. -- Chunk [a] gives the currently available part of the stream. -- The stream is not terminated yet. -- The case (Chunk []) signifies a stream with no currently available -- data but which is still continuing. A stream processor should, -- informally speaking, ``suspend itself'' and wait for more data -- to arrive. data StreamG a = EOF | Err String | Chunk [a] deriving Show -- Iteratee -- a generic stream processor, what is being folded over -- a stream -- When Iteratee is in the 'done' state, it contains the computed -- result and the remaining part of the stream. -- In the 'cont' state, the iteratee has not finished the computation -- and needs more input. -- We assume that all iteratees are `good' -- given bounded input, -- they do the bounded amount of computation and take the bounded amount -- of resources. The monad m describes the sort of computations done -- by the iteratee as it processes the stream. The monad m could be -- the identity monad (for pure computations) or the IO monad -- (to let the iteratee store the stream processing results as they -- are computed). -- We also assume that given a terminated stream, an iteratee -- moves to the done state, so the results computed so far could be returned. -- We could have used existentials instead, by doing the closure conversion data IterateeG el m a = IE_done a (StreamG el) | IE_cont (StreamG el -> IterateeGM el m a) newtype IterateeGM el m a = IM{unIM:: m (IterateeG el m a)} #ifdef HIDE_EXTRA_CODE #else type Iteratee m a = IterateeG Char m a type IterateeM m a = IterateeGM Char m a type Stream = StreamG Char #endif -- Useful combinators for implementing iteratees and enumerators liftI :: Monad m => IterateeG el m a -> IterateeGM el m a liftI = IM . return {-# INLINE liftI #-} -- Just like bind (at run-time, this is indeed exactly bind) infixl 1 >>== (>>==):: Monad m => IterateeGM el m a -> (IterateeG el m a -> IterateeGM el' m b) -> IterateeGM el' m b m >>== f = IM (unIM m >>= unIM . f) {-# INLINE (>>==) #-} -- Just like an application -- a call-by-value-like application infixr 1 ==<< (==<<) :: Monad m => (IterateeG el m a -> IterateeGM el' m b) -> IterateeGM el m a -> IterateeGM el' m b f ==<< m = m >>== f -- It turns out, IterateeGM form a monad. We can use the familiar do -- notation for composing Iteratees instance Monad m => Monad (IterateeGM el m) where return x = liftI $ IE_done x (Chunk []) m >>= f = m >>== docase where docase (IE_done a (Chunk [])) = f a docase (IE_done a stream) = f a >>== (\r -> case r of IE_done x _ -> liftI $ IE_done x stream IE_cont k -> k stream) docase (IE_cont k) = liftI $ IE_cont ((>>= f) . k) instance MonadTrans (IterateeGM el) where lift m = IM (m >>= unIM . return) -- ------------------------------------------------------------------------ -- Primitive iteratees -- Read a stream to the end and return all of its elements as a list stream2list :: Monad m => IterateeGM el m [el] stream2list = liftI $ IE_cont (step []) where step acc (Chunk []) = liftI $ IE_cont (step acc) step acc (Chunk ls) = liftI $ IE_cont (step $ acc ++ ls) step acc stream = liftI $ IE_done acc stream -- ------------------------------------------------------------------------ -- Parser combinators -- The analogue of List.break -- It takes an element predicate and returns a pair: -- (str, Just c) -- the element 'c' is the first element of the stream -- satisfying the break predicate; -- The list str is the prefix of the stream up -- to but including 'c' -- (str,Nothing) -- The stream is terminated with EOF or error before -- any element satisfying the break predicate was found. -- str is the scanned part of the stream. -- None of the element in str satisfy the break predicate. sbreak :: Monad m => (el -> Bool) -> IterateeGM el m ([el],Maybe el) sbreak cpred = liftI $ IE_cont (liftI . step []) where step before (Chunk []) = IE_cont (liftI . step before) step before (Chunk str) = case break cpred str of (_,[]) -> IE_cont (liftI . step (before ++ str)) (str',c:tail') -> done (before ++ str') (Just c) (Chunk tail') step before stream = done before Nothing stream done line' char stream = IE_done (line',char) stream -- A particular optimized case of the above: skip all elements of the stream -- satisfying the given predicate -- until the first element -- that does not satisfy the predicate, or the end of the stream. -- This is the analogue of List.dropWhile sdropWhile :: Monad m => (el -> Bool) -> IterateeGM el m () sdropWhile cpred = liftI $ IE_cont step where step (Chunk []) = sdropWhile cpred step (Chunk str) = case dropWhile cpred str of [] -> sdropWhile cpred str' -> liftI $ IE_done () (Chunk str') step stream = liftI $ IE_done () stream -- Attempt to read the next element of the stream -- Return (Just c) if successful, return Nothing if the stream is -- terminated (by EOF or an error) snext :: Monad m => IterateeGM el m (Maybe el) snext = {-# SCC "snext" #-} liftI $ IE_cont step where step (Chunk []) = snext step (Chunk (c:t)) = liftI $ IE_done (Just c) (Chunk t) step stream = liftI $ IE_done Nothing stream -- Look ahead at the next element of the stream, without removing -- it from the stream. -- Return (Just c) if successful, return Nothing if the stream is -- terminated (by EOF or an error) speek :: Monad m => IterateeGM el m (Maybe el) speek = liftI $ IE_cont step where step (Chunk []) = speek step s@(Chunk (c:_)) = liftI $ IE_done (Just c) s step stream = liftI $ IE_done Nothing stream -- Skip the rest of the stream skip_till_eof :: Monad m => IterateeGM el m () skip_till_eof = liftI $ IE_cont step where step (Chunk _) = skip_till_eof step _ = return () -- Skip n elements of the stream, if there are that many -- This is the analogue of List.drop sdrop :: Monad m => Int -> IterateeGM el m () sdrop 0 = return () sdrop n = liftI $ IE_cont step where step (Chunk str) | length str <= n = sdrop (n - length str) step (Chunk str) = liftI $ IE_done () (Chunk s2) where (_s1,s2) = splitAt n str step stream = liftI $ IE_done () stream -- Read n elements from a stream and apply the given iteratee to the -- stream of the read elements. Unless the stream is terminated early, we -- read exactly n elements (even if the iteratee has accepted fewer). -- This procedure shows a different way of composing two iteratees: -- `vertical' rather than `horizontal' stake :: Monad m => Int -> IterateeG el m a -> IterateeGM el m (IterateeG el m a) stake 0 iter = return iter stake n [EMAIL PROTECTED] = sdrop n >> return iter stake n (IE_cont k) = {-# SCC "stake" #-} liftI $ IE_cont step where step (Chunk []) = liftI $ IE_cont step step chunk@(Chunk str) | length str <= n = stake (n - length str) ==<< k chunk step (Chunk str) = done (Chunk s1) (Chunk s2) where (s1,s2) = splitAt n str step stream = done stream stream done s1 s2 = k s1 >>== \r -> liftI $ IE_done r s2 -- Map the stream: yet another iteratee transformer -- Given the stream of elements of the type el and the function el->el', -- build a nested stream of elements of the type el' and apply the -- given iteratee to it. -- Note the contravariance map_stream :: Monad m => (el -> el') -> IterateeG el' m a -> IterateeGM el m (IterateeG el' m a) map_stream _f [EMAIL PROTECTED] = return iter map_stream f (IE_cont k) = liftI $ IE_cont step where step (Chunk []) = liftI $ IE_cont step step (Chunk str) = k (Chunk (map f str)) >>== map_stream f step EOF = k EOF >>== \r -> liftI $ IE_done r EOF step (Err err) = k (Err err) >>== \r -> liftI $ IE_done r (Err err) -- ------------------------------------------------------------------------ -- Combining the primitive iteratees to solve the running problem: -- Reading headers and the content from an HTTP-like stream #ifdef HIDE_EXTRA_CODE #else -- TODO: The large performance change happens somewhere in this block. type Line = String -- The line of text, terminators are not included -- Read the line of text from the stream -- The line can be terminated by CR, LF or CRLF. -- Return (Right Line) if successful. Return (Left Line) if EOF or -- a stream error were encountered before the terminator is seen. -- The returned line is the string read so far. -- The code is the same as that of pure Iteratee, only the signature -- has changed. -- Compare the code below with GHCBufferIO.line_lazy line :: Monad m => IterateeM m (Either Line Line) line = sbreak (\c -> c == '\r' || c == '\n') >>= check_next where check_next (line',Just '\r') = speek >>= \c -> case c of Just '\n' -> snext >> return (Right line') Just _ -> return (Right line') Nothing -> return (Left line') check_next (line',Just _) = return (Right line') check_next (line',Nothing) = return (Left line') -- Line iteratees: processors of a stream whose elements are made of Lines -- Collect all read lines and return them as a list -- see stream2list -- Print lines as they are received. This is the first `impure' iteratee -- with non-trivial actions during chunk processing {-# NOINLINE print_lines #-} print_lines :: IterateeGM Line IO () print_lines = liftI $ IE_cont step where step (Chunk []) = print_lines step (Chunk ls) = lift (mapM_ pr_line ls) >> print_lines step EOF = lift (putStrLn ">> natural end") >> liftI (IE_done () EOF) step stream = lift (putStrLn ">> unnatural end") >> liftI (IE_done () stream) pr_line line' = putStrLn $ ">> read line: " ++ line' -- Convert the stream of characters to the stream of lines, and -- apply the given iteratee to enumerate the latter. -- The stream of lines is normally terminated by the empty line. -- When the stream of characters is terminated, the stream of lines -- is also terminated, abnormally. -- This is the first proper iteratee-enumerator: it is the iteratee of the -- character stream and the enumerator of the line stream. enum_lines :: Monad m => IterateeG Line m a -> IterateeGM Char m (IterateeG Line m a) enum_lines [EMAIL PROTECTED] = return iter enum_lines (IE_cont k) = line >>= check_line k where check_line k' (Right "") = enum_lines ==<< k' EOF -- empty line, normal term check_line k' (Right l) = enum_lines ==<< k' (Chunk [l]) check_line k' _ = enum_lines ==<< k' (Err "EOF") -- abnormal termin -- Convert the stream of characters to the stream of words, and -- apply the given iteratee to enumerate the latter. -- Words are delimited by white space. -- This is the analogue of List.words -- It is instructive to compare the code below with the code of -- List.words, which is: -- words :: String -> [String] -- words s = case dropWhile isSpace s of -- "" -> [] -- s' -> w : words s'' -- where (w, s'') = -- break isSpace s' -- One should keep in mind that enum_words is a more general, monadic -- function. enum_words :: Monad m => IterateeG String m a -> IterateeGM Char m (IterateeG String m a) enum_words [EMAIL PROTECTED] = return iter enum_words (IE_cont k) = sdropWhile isSpace >> sbreak isSpace >>= check_word k where check_word k' ("",_) = enum_words ==<< k' EOF check_word k' (str,_) = enum_words ==<< k' (Chunk [str]) #endif -- ------------------------------------------------------------------------ -- Enumerators -- Each enumerator takes an iteratee and returns an iteratee -- an Enumerator is an iteratee transformer. -- The enumerator normally stops when the stream is terminated -- or when the iteratee moves to the done state, whichever comes first. -- When to stop is of course up to the enumerator... -- We have two choices of composition: compose iteratees or compose -- enumerators. The latter is useful when one iteratee -- reads from the concatenation of two data sources. type EnumeratorGM el m a = IterateeG el m a -> IterateeGM el m a --type EnumeratorM m a = EnumeratorGM Char m a type EnumeratorM m a = EnumeratorGM Word8 m a -- The most primitive enumerator: applies the iteratee to the terminated -- stream. The result is the iteratee usually in the done state. enum_eof :: Monad m => EnumeratorGM el m a enum_eof (IE_done x _) = liftI $ IE_done x EOF enum_eof (IE_cont k) = k EOF -- Another primitive enumerator: report an error enum_err :: Monad m => String -> EnumeratorGM el m a enum_err str (IE_done x _) = liftI $ IE_done x (Err str) enum_err str (IE_cont k) = k (Err str) -- The composition of two enumerators: essentially the functional composition -- It is convenient to flip the order of the arguments of the composition -- though: in e1 >. e2, e1 is executed first (>.):: Monad m => EnumeratorGM el m a -> EnumeratorGM el m a -> EnumeratorGM el m a e1 >. e2 = (e2 ==<<) . e1 -- The pure 1-chunk enumerator -- It passes a given list of elements to the iteratee in one chunk -- This enumerator does no IO and is useful for testing of base parsing enum_pure_1chunk :: Monad m => [el] -> EnumeratorGM el m a enum_pure_1chunk _str [EMAIL PROTECTED] = liftI $ iter enum_pure_1chunk str (IE_cont k) = k (Chunk str) -- The pure n-chunk enumerator -- It passes a given lift of elements to the iteratee in n chunks -- This enumerator does no IO and is useful for testing of base parsing -- and handling of chunk boundaries enum_pure_nchunk :: Monad m => [el] -> Int -> EnumeratorGM el m a enum_pure_nchunk _str _n [EMAIL PROTECTED] = liftI $ iter enum_pure_nchunk [] _n iter = liftI $ iter enum_pure_nchunk str n (IE_cont k) = enum_pure_nchunk s2 n ==<< k (Chunk s1) where (s1,s2) = splitAt n str -- enumerator of a filehandle. -- POSIX descriptors, alas, are not portable to Windows enum_h :: Handle -> EnumeratorM IO a enum_h h iter' = IM $ allocaBytes (fromIntegral buffer_size) $ loop iter' where buffer_size = 4096 loop [EMAIL PROTECTED] _p = return iter loop iter@(IE_cont step) p = do n <- try $ hGetBuf h p buffer_size case n of Left _ex -> unIM $ step (Err "IO error") Right 0 -> return iter Right j -> do str <- peekArray (fromIntegral j) p im <- unIM $ step (Chunk str) loop im p enum_file :: FilePath -> EnumeratorM IO a enum_file filepath iter = IM $ do #ifdef SHORT_EXPORT_LIST print "SHORT" #else print "LONG" #endif print $ "reading file" ++ show filepath h <- openBinaryFile filepath ReadMode r <- unIM $ enum_h h iter hClose h return r #ifdef HIDE_EXTRA_CODE #else -- HTTP chunk decoding -- Each chunk has the following format: -- -- <chunk-size> CRLF <chunk-data> CRLF -- -- where <chunk-size> is the hexadecimal number; <chunk-data> is a -- sequence of <chunk-size> bytes. -- The last chunk (so-called EOF chunk) has the format -- 0 CRLF CRLF (where 0 is an ASCII zero, a character with the decimal code 48). -- For more detail, see "Chunked Transfer Coding", Sec 3.6.1 of -- the HTTP/1.1 standard: -- http://www.w3.org/Protocols/rfc2616/rfc2616-sec3.html#sec3.6.1 -- The following enum_chunk_decoded has the signature of the enumerator -- of the nested (encapsulated and chunk-encoded) stream. It receives -- an iteratee for the embedded stream and returns the iteratee for -- the base, embedding stream. Thus what is an enumerator and what -- is an iteratee may be a matter of perspective. -- We have a decision to make: Suppose an iteratee has finished (either because -- it obtained all needed data or encountered an error that makes further -- processing meaningless). While skipping the rest of the stream/the trailer, -- we encountered a framing error (e.g., missing CRLF after chunk data). -- What do we do? We chose to disregard the latter problem. -- Rationale: when the iteratee has finished, we are in the process -- of skipping up to the EOF (draining the source). -- Disregarding the errors seems OK then. -- Also, the iteratee may have found an error and decided to abort further -- processing. Flushing the remainder of the input is reasonable then. -- One can make a different choice... enum_chunk_decoded :: Monad m => Iteratee m a -> IterateeM m a enum_chunk_decoded = docase where docase [EMAIL PROTECTED] = liftI iter >>= (\r -> (enum_chunk_decoded ==<< skip_till_eof) >> return r) docase iter@(IE_cont k) = line >>= check_size where check_size (Right "0") = line >> k EOF check_size (Right str) = maybe (k . Err $ "Bad chunk size: " ++ str) (read_chunk iter) $ read_hex 0 str check_size _ = k (Err "Error reading chunk size") read_chunk iter size = do r <- stake size iter c1 <- snext c2 <- snext case (c1,c2) of (Just '\r',Just '\n') -> docase r _ -> (enum_chunk_decoded ==<< skip_till_eof) >> enum_err "Bad chunk trailer" r read_hex acc "" = Just acc read_hex acc (d:rest) | isHexDigit d = read_hex (16*acc + digitToInt d) rest read_hex _acc _ = Nothing -- ------------------------------------------------------------------------ -- Tests -- Pure tests, requiring no IO test_str1 :: String test_str1 = "header1: v1\rheader2: v2\r\nheader3: v3\nheader4: v4\n" ++ "header5: v5\r\nheader6: v6\r\nheader7: v7\r\n\nrest\n" testp1 :: Bool testp1 = let IE_done (IE_done lines' EOF) (Chunk rest) = runIdentity . unIM $ enum_pure_1chunk test_str1 ==<< (enum_lines ==<< stream2list) in lines' == ["header1: v1","header2: v2","header3: v3","header4: v4", "header5: v5","header6: v6","header7: v7"] && rest == "rest\n" testp2 :: Bool testp2 = let IE_done (IE_done lines' EOF) (Chunk rest) = runIdentity . unIM $ enum_pure_nchunk test_str1 5 ==<< (enum_lines ==<< stream2list) in lines' == ["header1: v1","header2: v2","header3: v3","header4: v4", "header5: v5","header6: v6","header7: v7"] && rest == "r" testw1 :: Bool testw1 = let test_str = "header1: v1\rheader2: v2\r\nheader3:\t v3" expected = ["header1:","v1","header2:","v2","header3:","v3"] in let run_test test_str' = let IE_done (IE_done words' EOF) EOF = runIdentity . unIM $ (enum_pure_nchunk test_str' 5 >. enum_eof) ==<< (enum_words ==<< stream2list) in words' in and [run_test test_str == expected, run_test (test_str ++ " ") == expected] -- Test Fd driver {- test_driver line_collector filepath = do fd <- openFd filepath ReadOnly Nothing defaultFileFlags putStrLn "About to read headers" result <- unIM $ (enum_fd fd >. enum_eof) ==<< read_lines_and_one_more_line closeFd fd putStrLn "Finished reading headers" case result of IE_done (IE_done headers EOF,after) _ -> do putStrLn $ "The line after headers is: " ++ show after putStrLn "Complete headers" print headers IE_done (IE_done headers err,_) stream -> do putStrLn $ "Problem " ++ show stream putStrLn "Incomplete headers" print headers where read_lines_and_one_more_line = do lines <- enum_lines ==<< line_collector after <- line return (lines,after) test11 = test_driver stream2list "test1.txt" test12 = test_driver stream2list "test2.txt" test13 = test_driver stream2list "test3.txt" test14 = test_driver stream2list "/dev/null" test21 = test_driver print_lines "test1.txt" test22 = test_driver print_lines "test2.txt" test23 = test_driver print_lines "test3.txt" test24 = test_driver print_lines "/dev/null" -} -- Run the complete test, reading the headers and the body -- This simple iteratee is used to process a variety of streams: -- embedded, interleaved, etc. line_printer :: IterateeGM Char IO (IterateeG Line IO ()) line_printer = enum_lines ==<< print_lines -- Two sample processors -- Read the headers, print the headers, read the lines of the chunk-encoded -- body and print each line as it has been read read_headers_print_body :: IterateeGM Char IO (IterateeG Line IO ()) read_headers_print_body = do headers' <- enum_lines ==<< stream2list case headers' of IE_done headers EOF -> lift $ do putStrLn "Complete headers" print headers IE_done headers (Err err) -> lift $ do putStrLn $ "Incomplete headers due to " ++ err print headers _ -> lift $ putStrLn "Pattern not matched" lift $ putStrLn "\nLines of the body follow" enum_chunk_decoded ==<< line_printer -- Read the headers and print the header right after it has been read -- Read the lines of the chunk-encoded body and print each line as -- it has been read print_headers_print_body :: IterateeGM Char IO (IterateeG Line IO ()) print_headers_print_body = do lift $ putStrLn "\nLines of the headers follow" line_printer lift $ putStrLn "\nLines of the body follow" enum_chunk_decoded ==<< line_printer {- test_driver_full iter filepath = do fd <- openFd filepath ReadOnly Nothing defaultFileFlags putStrLn "About to read headers" unIM $ (enum_fd fd >. enum_eof) ==<< iter closeFd fd putStrLn "Finished reading" test31 = test_driver_full read_headers_print_body "test_full1.txt" test32 = test_driver_full read_headers_print_body "test_full2.txt" test33 = test_driver_full read_headers_print_body "test_full3.txt" test34 = test_driver_full print_headers_print_body "test_full3.txt" -- Interleaved reading from two descriptors using select -- -- If the two arguments are the names of regular files, the driver -- does simple round-robin interleaving, reading a block from one -- file and a block from the other file. If the arguments name -- pipes or devices, the reading becomes truly supply-driven. -- We use select for multiplexing. -- The first argument is the reader-iteratee. It is exactly -- the same iteratee that is being used in the `sequential' tests above. -- By design, two Fds are being read independently and in parallel, -- closely emulating two OS processes each reading from their own file. -- The code below is a simple, round-robin OS scheduler. test_driver_mux iter fpath1 fpath2 = do fd1 <- openFd fpath1 ReadOnly Nothing defaultFileFlags fd2 <- openFd fpath2 ReadOnly Nothing defaultFileFlags let fds = [fd1,fd2] putStrLn $ "Opened file descriptors: " ++ show fds mapM (\(fd,reader) -> unIM reader >>= return . ((,) fd)) (zip fds (repeat iter)) >>= allocaBytes (fromIntegral buffer_size) . loop mapM_ closeFd fds putStrLn $ "Closed file descriptors. All done" where -- we use one single IO buffer for reading buffer_size = 5 -- for tests; in real life, there should be 1024 or so loop fjque buf = do let fds = get_fds fjque if null fds then return () else do selected <- select'read'pending fds case selected of Left errno -> putStrLn "IO Err" >> tell_iteratee_err "IO Err" fjque >> return () Right [] -> loop fjque buf Right sel -> process buf sel fjque -- get Fds from the jobqueue for the unfinished iteratees get_fds = foldr (\ (fd,iter) acc -> case iter of {IE_cont _ -> fd:acc; _ -> acc}) [] -- find the first ready jobqueue element, -- that is, the job queue element whose Fd is in selected. -- Return the element and the rest of the queue get_ready selected jq = (e, before ++ after) where (before,e:after) = break (\(fd,_) -> fd `elem` selected) jq process buf selected fjque = do let ((fd,IE_cont step),fjrest) = get_ready selected fjque n <- myfdRead fd buf buffer_size putStrLn $ unwords ["Read buffer, size", either (const "IO err") show n, "from fd", show fd] case n of Left errno -> unIM (step (Err "IO error")) >> loop fjrest buf Right 0 -> unIM (step EOF) >> loop fjrest buf Right n -> do str <- peekCAStringLen (buf,fromIntegral n) im <- unIM $ step (Chunk str) loop (fjrest ++ [(fd,im)]) buf -- round-robin tell_iteratee_err err = mapM_ (\ (_,iter) -> unIM (enum_err err iter)) -- Running these tests shows true interleaving, of reading from the -- two file descriptors and of printing the results. All IO is interleaved, -- and yet it is safe. No unsafe operations are used. testm1 = test_driver_mux line_printer "test1.txt" "test3.txt" testm2 = test_driver_mux print_headers_print_body "test_full2.txt" "test_full3.txt" -} #endif
_______________________________________________ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users