I think this precise issue is why the snap server http parser tooling uses the iostreams lib!
On Thu, Jan 30, 2014 at 1:02 PM, Jeremy Shaw <[email protected]> wrote: > I have been thinking about what it means to run a 'Producer' > twice. Specifically -- whether the Producer resumes where it left of > or not. I think that in general the behavior is undefined. I feel like > this has not been explicitly stated much -- so I am going to say it > now. In some sense, it should be obvious -- but when peering through > the haze of Pipes, StateT, and IO, the simple things can get lost. > > Consider two different cases: > > 1. a producer that produces values from a pure list > > 2. a producer that produces values from a network connection > > > If we run the first producer twice we will get the same answer each > time. If we run the second producer twice -- we will likely get > different results -- depending on what data is available from the > network stream. > > Now -- that is not entirely surprising -- one value is pure and one is > based on IO. So that is no different than calling a normal pure > function versus a normal IO function. > > But -- I think it can be easy to forget that when writing pipes > code. Imagine we write some pipes code that processes a network stream > -- and it relies on the fact that the network Producer automatically > resumes from where it left off. > > Now, let's pretend we want to test our code. So we create a pure > Producer that produces the same bytestring that the network pipe was > producing. Alas, our code will not work because the pure Producer does > not automatically resume when called multiple times. > > I think this means that we must assume, by default, that the Producer > does not have resumable behavior. If we want to write code that relies > on the resumable behavior -- then we must explictly ensure that it > happens. > > In pipes-parse the resumability is handled by storing the 'Producer' > in 'StateT'. > > Another alternative is to use an 'IORef'. I have an example of the > 'IORef' solution below. > > > module Main where > > > import Data.IORef (IORef(..), newIORef, readIORef, > writeIORef) > > import Pipes > > import qualified Pipes.Prelude as P > > Here is our pure Producer: > > > pure10 :: (Monad m) => Producer Int m () > > pure10 = mapM_ yield [1..10] > > And here is a function which uses a Producer twice. > > > take5_twice :: Show a => Producer a IO () -> IO () > > take5_twice p = > > do runEffect $ p >-> P.take 5 >-> P.print > > putStrLn "<<Intermission>>" > > runEffect $ p >-> P.take 5 >-> P.print > > Note that we have limited ability reason about the results since we do > not know if the 'Producer' is resumable or not. > > If we run 'take5_twice' using our pure Producer: > > > pure10_test :: IO () > > pure10_test = > > take5_twice pure10 > > it will restart from 1 each time: > > > pure10_test > 1 > 2 > 3 > 4 > 5 > <<Intermission>> > 1 > 2 > 3 > 4 > 5 > > Here is a (not very generalized) function that uses an 'IORef' to > store the current position in the 'Producer' -- similar to how > 'StateT' works: > > > resumable :: Producer Int IO () -> IO (Producer Int IO ()) > > resumable p0 = > > do ref <- liftIO $ newIORef p0 > > return (go ref) > > where > > go :: IORef (Producer Int IO ()) -> Producer Int IO () > > go ref = > > do p <- liftIO $ readIORef ref > > x <- liftIO $ next p > > case x of > > (Right (i, p')) -> > > do liftIO $ writeIORef ref p' > > yield i > > go ref > > (Left ()) -> > > do liftIO $ writeIORef ref (return ()) > > return () > > Now if we call 'take5_twice' with our resumable Producer: > > > impure10_test :: IO () > > impure10_test = > > do p <- resumable pure10 > > take5_twice p > > Here we see the resuming behavior: > > > impure10_test > 1 > 2 > 3 > 4 > 5 > <<Intermission>> > 6 > 7 > 8 > 9 > 10 > > If we call 'resumable' on a 'Producer' that already has resumable > behavior -- it will still work. We can simulate that by calling resumable > twice: > > > twice_resumable :: IO () > > twice_resumable = > > do p0 <- resumable pure10 > > p <- resumable p0 > > take5_twice p > > > > twice_resumable > 1 > 2 > 3 > 4 > 5 > <<Intermission>> > 6 > 7 > 8 > 9 > 10 > > Of course, we now have the overhead of *two* 'IORef' based Producers. > > So we are now left with some questions of style. > > If we are writing something like an HTTP server -- we can assume that > most of the time we are going to working with a 'Producer' based on a > resumable source like a network stream. So, by using the inherent > resumability we can presumably get lower overhead and higher > performance. If we need to use the code with a non-resumable Producer > then we can use a function like 'resumable' to fake it. > > This is somewhat distasteful in two ways though. > > (1) It forces everything to be in the IO monad -- even when > everything could actually be pure. > > (2) it relies on the resumability of the Producer -- but there is no > enforcement or indication of that in the type system. > > In some sense -- being in the IO monad is not really a big deal since any > practical web server needs to be anyway. On the other hand -- creating > a nice pure streaming abstraction and sticking an ugly IORef in it seems > a little sad. > > The alternative is to run all our code inside a 'StateT'. Since the > 'StateT' takes care of resuming we do not have to worry if the > underlying Producer does or not. But.. now we always have the overhead > of being inside a 'StateT' even we don't really need to be -- so we > have a more complicated set of types to work with and more potential > overhead. > > The upside is that our pure code stays pure. We only introduce the IO > monad when IO is really used. > > This is the major decision blocking hyperdrive at the > moment. (hyperdrive is my pipes based HTTP server). > > Any thoughts? > > - jeremy > > -- > You received this message because you are subscribed to the Google Groups > "Haskell Pipes" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To post to this group, send email to [email protected]. > -- You received this message because you are subscribed to the Google Groups "Haskell Pipes" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected].
