Re: [haskell-pipes] Producers, Purity and Resumability

Jeremy Shaw Thu, 30 Jan 2014 12:44:59 -0800

Yes -- io-stream forces everything to be done in the IO monad and uses
hidden IORefs. conduit also uses hidden IORefs for resumable streams.


But is that really the best choice?

- jeremy

On Thu, Jan 30, 2014 at 2:12 PM, Carter Schonwald
<[email protected]> wrote:
> I think this precise issue is why the snap server http parser tooling uses
> the iostreams lib!
>
>
> On Thu, Jan 30, 2014 at 1:02 PM, Jeremy Shaw <[email protected]> wrote:
>>
>> I have been thinking about what it means to run a 'Producer'
>> twice. Specifically -- whether the Producer resumes where it left of
>> or not. I think that in general the behavior is undefined. I feel like
>> this has not been explicitly stated much -- so I am going to say it
>> now. In some sense, it should be obvious -- but when peering through
>>  the haze of Pipes, StateT, and IO, the simple things can get lost.
>>
>> Consider two different cases:
>>
>>  1. a producer that produces values from a pure list
>>
>>  2. a producer that produces values from a network connection
>>
>>
>> If we run the first producer twice we will get the same answer each
>> time. If we run the second producer twice -- we will likely get
>> different results -- depending on what data is available from the
>> network stream.
>>
>> Now -- that is not entirely surprising -- one value is pure and one is
>> based on IO. So that is no different than calling a normal pure
>> function versus a normal IO function.
>>
>> But -- I think it can be easy to forget that when writing pipes
>> code. Imagine we write some pipes code that processes a network stream
>> -- and it relies on the fact that the network Producer automatically
>> resumes from where it left off.
>>
>> Now, let's pretend we want to test our code. So we create a pure
>> Producer that produces the same bytestring that the network pipe was
>> producing. Alas, our code will not work because the pure Producer does
>> not automatically resume when called multiple times.
>>
>> I think this means that we must assume, by default, that the Producer
>> does not have resumable behavior. If we want to write code that relies
>> on the resumable behavior -- then we must explictly ensure that it
>> happens.
>>
>> In pipes-parse the resumability is handled by storing the 'Producer'
>> in 'StateT'.
>>
>> Another alternative is to use an 'IORef'. I have an example of the
>> 'IORef' solution below.
>>
>> > module Main where
>>
>> > import Data.IORef             (IORef(..), newIORef, readIORef,
>> > writeIORef)
>> > import           Pipes
>> > import qualified Pipes.Prelude as P
>>
>> Here is our pure Producer:
>>
>> > pure10 :: (Monad m) => Producer Int m ()
>> > pure10 = mapM_ yield [1..10]
>>
>> And here is a function which uses a Producer twice.
>>
>> > take5_twice :: Show a => Producer a IO () -> IO ()
>> > take5_twice p =
>> >     do runEffect $ p >-> P.take 5 >-> P.print
>> >        putStrLn "<<Intermission>>"
>> >        runEffect $ p >-> P.take 5 >-> P.print
>>
>> Note that we have limited ability reason about the results since we do
>> not know if the 'Producer' is resumable or not.
>>
>> If we run 'take5_twice' using our pure Producer:
>>
>> > pure10_test :: IO ()
>> > pure10_test =
>> >     take5_twice pure10
>>
>> it will restart from 1 each time:
>>
>>     > pure10_test
>>     1
>>     2
>>     3
>>     4
>>     5
>>     <<Intermission>>
>>     1
>>     2
>>     3
>>     4
>>     5
>>
>> Here is a (not very generalized) function that uses an 'IORef' to
>> store the current position in the 'Producer' -- similar to how
>> 'StateT' works:
>>
>> > resumable :: Producer Int IO () -> IO (Producer Int IO ())
>> > resumable p0 =
>> >    do ref <- liftIO $ newIORef p0
>> >       return (go ref)
>> >    where
>> >      go :: IORef (Producer Int IO ()) -> Producer Int IO ()
>> >      go ref =
>> >          do p <- liftIO $ readIORef ref
>> >             x <- liftIO $ next p
>> >             case x of
>> >               (Right (i, p')) ->
>> >                   do liftIO $ writeIORef ref p'
>> >                      yield i
>> >                      go ref
>> >               (Left ()) ->
>> >                   do liftIO $ writeIORef ref (return ())
>> >                      return ()
>>
>> Now if we call 'take5_twice' with our resumable Producer:
>>
>> > impure10_test :: IO ()
>> > impure10_test =
>> >     do p <- resumable pure10
>> >        take5_twice p
>>
>> Here we see the resuming behavior:
>>
>>     > impure10_test
>>     1
>>     2
>>     3
>>     4
>>     5
>>     <<Intermission>>
>>     6
>>     7
>>     8
>>     9
>>     10
>>
>> If we call 'resumable' on a 'Producer' that already has resumable
>> behavior -- it will still work. We can simulate that by calling resumable
>> twice:
>>
>> > twice_resumable :: IO ()
>> > twice_resumable =
>> >     do p0 <- resumable pure10
>> >        p  <- resumable p0
>> >        take5_twice p
>>
>>
>>     > twice_resumable
>>     1
>>     2
>>     3
>>     4
>>     5
>>     <<Intermission>>
>>     6
>>     7
>>     8
>>     9
>>     10
>>
>> Of course, we now have the overhead of *two* 'IORef' based Producers.
>>
>> So we are now left with some questions of style.
>>
>> If we are writing something like an HTTP server -- we can assume that
>> most of the time we are going to working with a 'Producer' based on a
>> resumable source like a network stream. So, by using the inherent
>> resumability we can presumably get lower overhead and higher
>> performance. If we need to use the code with a non-resumable Producer
>> then we can use a function like 'resumable' to fake it.
>>
>> This is somewhat distasteful in two ways though.
>>
>>  (1) It forces everything to be in the IO monad -- even when
>>      everything could actually be pure.
>>
>>  (2) it relies on the resumability of the Producer -- but there is no
>>      enforcement or indication of that in the type system.
>>
>> In some sense -- being in the IO monad is not really a big deal since any
>> practical web server needs to be anyway. On the other hand -- creating
>> a nice pure streaming abstraction and sticking an ugly IORef in it seems
>> a little sad.
>>
>> The alternative is to run all our code inside a 'StateT'. Since the
>> 'StateT' takes care of resuming we do not have to worry if the
>> underlying Producer does or not. But.. now we always have the overhead
>> of being inside a 'StateT' even we don't really need to be -- so we
>> have a more complicated set of types to work with and more potential
>> overhead.
>>
>> The upside is that our pure code stays pure. We only introduce the IO
>> monad when IO is really used.
>>
>> This is the major decision blocking hyperdrive at the
>> moment. (hyperdrive is my pipes based HTTP server).
>>
>> Any thoughts?
>>
>> - jeremy
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "Haskell Pipes" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to [email protected].
>> To post to this group, send email to [email protected].
>
>
> --
> You received this message because you are subscribed to the Google Groups
> "Haskell Pipes" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To post to this group, send email to [email protected].

-- 
You received this message because you are subscribed to the Google Groups 
"Haskell Pipes" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].

Re: [haskell-pipes] Producers, Purity and Resumability

Reply via email to