Re: [haskell-pipes] Producers, Purity and Resumability

Dan Burton Thu, 30 Jan 2014 13:31:45 -0800

>
> And here is a function which uses a Producer twice.


> > take5_twice :: Show a => Producer a IO () -> IO ()
> > take5_twice p =
> >     do runEffect $ p >-> P.take 5 >-> P.print
> >        putStrLn "<<Intermission>>"
> >        runEffect $ p >-> P.take 5 >-> P.print


The reason that the "resumable" pipe resumes is because it pushes the pipe
state tracking down into the IO monad. So when you stitch the code
together, you get:

    do -- an IO do block
      p <- resumable pureP -- allocate IORef
      runEffect $ p >-> P.take 5 >-> P.print -- read and update the IORef
      putStrLn "<<Intermission>>"
      runEffect $ p >-> P.take 5 >-> P.print -- read and update the IORef

In short, *pipes do not have resumability*, so what you have done is
*added* resumability
by utilizing specific capabilities of the underlying monad (in this case
IO, but you could do the same thing with ST or State).

You seem to imply that this is a problem with pipes generally, because
running the same pipe multiple times might have unexpected effects. I
instead see it as a problem with *adding resumability to pipes*, or other
such dangerous effects, and *that* is what makes a given pipe behave in
potentially unexpected ways.

... whether the Producer resumes where it left of
> or not. I think that in general the behavior is undefined


This is incorrect. When you use pipe composition, a Producer *always* runs
sequentially, for as long as downstream pipes pull from it in a pipeline.
When you use the same pipe in two different pipelines, a Producer
*always* starts
over again from the beginning in each pipeline. But when you have a
Producer that looks at an IORef or makes a network call or performs some
sort of effect, then "starting over from the beginning" won't necessarily
produce the same results. The key difference is that it starts over *with
the same set of instructions*. If its instructions tell it to look at some
outside state and base its behavior off of that, well then all bets are off
for "intuitive" behavior, unless you are able to intuitively keep track of
that outside state. Intuitive reasoning about a given pipe's behavior is
only as good as the intuitive reasoning for the underlying monadic effects
which it is built upon. If you want to have stronger guarantees, then you
must use a more principled monad.


-- Dan Burton


On Thu, Jan 30, 2014 at 12:43 PM, Jeremy Shaw <[email protected]> wrote:

> Yes -- io-stream forces everything to be done in the IO monad and uses
> hidden IORefs. conduit also uses hidden IORefs for resumable streams.
>
> But is that really the best choice?
>
> - jeremy
>
> On Thu, Jan 30, 2014 at 2:12 PM, Carter Schonwald
> <[email protected]> wrote:
> > I think this precise issue is why the snap server http parser tooling
> uses
> > the iostreams lib!
> >
> >
> > On Thu, Jan 30, 2014 at 1:02 PM, Jeremy Shaw <[email protected]>
> wrote:
> >>
> >> I have been thinking about what it means to run a 'Producer'
> >> twice. Specifically -- whether the Producer resumes where it left of
> >> or not. I think that in general the behavior is undefined. I feel like
> >> this has not been explicitly stated much -- so I am going to say it
> >> now. In some sense, it should be obvious -- but when peering through
> >>  the haze of Pipes, StateT, and IO, the simple things can get lost.
> >>
> >> Consider two different cases:
> >>
> >>  1. a producer that produces values from a pure list
> >>
> >>  2. a producer that produces values from a network connection
> >>
> >>
> >> If we run the first producer twice we will get the same answer each
> >> time. If we run the second producer twice -- we will likely get
> >> different results -- depending on what data is available from the
> >> network stream.
> >>
> >> Now -- that is not entirely surprising -- one value is pure and one is
> >> based on IO. So that is no different than calling a normal pure
> >> function versus a normal IO function.
> >>
> >> But -- I think it can be easy to forget that when writing pipes
> >> code. Imagine we write some pipes code that processes a network stream
> >> -- and it relies on the fact that the network Producer automatically
> >> resumes from where it left off.
> >>
> >> Now, let's pretend we want to test our code. So we create a pure
> >> Producer that produces the same bytestring that the network pipe was
> >> producing. Alas, our code will not work because the pure Producer does
> >> not automatically resume when called multiple times.
> >>
> >> I think this means that we must assume, by default, that the Producer
> >> does not have resumable behavior. If we want to write code that relies
> >> on the resumable behavior -- then we must explictly ensure that it
> >> happens.
> >>
> >> In pipes-parse the resumability is handled by storing the 'Producer'
> >> in 'StateT'.
> >>
> >> Another alternative is to use an 'IORef'. I have an example of the
> >> 'IORef' solution below.
> >>
> >> > module Main where
> >>
> >> > import Data.IORef             (IORef(..), newIORef, readIORef,
> >> > writeIORef)
> >> > import           Pipes
> >> > import qualified Pipes.Prelude as P
> >>
> >> Here is our pure Producer:
> >>
> >> > pure10 :: (Monad m) => Producer Int m ()
> >> > pure10 = mapM_ yield [1..10]
> >>
> >> And here is a function which uses a Producer twice.
> >>
> >> > take5_twice :: Show a => Producer a IO () -> IO ()
> >> > take5_twice p =
> >> >     do runEffect $ p >-> P.take 5 >-> P.print
> >> >        putStrLn "<<Intermission>>"
> >> >        runEffect $ p >-> P.take 5 >-> P.print
> >>
> >> Note that we have limited ability reason about the results since we do
> >> not know if the 'Producer' is resumable or not.
> >>
> >> If we run 'take5_twice' using our pure Producer:
> >>
> >> > pure10_test :: IO ()
> >> > pure10_test =
> >> >     take5_twice pure10
> >>
> >> it will restart from 1 each time:
> >>
> >>     > pure10_test
> >>     1
> >>     2
> >>     3
> >>     4
> >>     5
> >>     <<Intermission>>
> >>     1
> >>     2
> >>     3
> >>     4
> >>     5
> >>
> >> Here is a (not very generalized) function that uses an 'IORef' to
> >> store the current position in the 'Producer' -- similar to how
> >> 'StateT' works:
> >>
> >> > resumable :: Producer Int IO () -> IO (Producer Int IO ())
> >> > resumable p0 =
> >> >    do ref <- liftIO $ newIORef p0
> >> >       return (go ref)
> >> >    where
> >> >      go :: IORef (Producer Int IO ()) -> Producer Int IO ()
> >> >      go ref =
> >> >          do p <- liftIO $ readIORef ref
> >> >             x <- liftIO $ next p
> >> >             case x of
> >> >               (Right (i, p')) ->
> >> >                   do liftIO $ writeIORef ref p'
> >> >                      yield i
> >> >                      go ref
> >> >               (Left ()) ->
> >> >                   do liftIO $ writeIORef ref (return ())
> >> >                      return ()
> >>
> >> Now if we call 'take5_twice' with our resumable Producer:
> >>
> >> > impure10_test :: IO ()
> >> > impure10_test =
> >> >     do p <- resumable pure10
> >> >        take5_twice p
> >>
> >> Here we see the resuming behavior:
> >>
> >>     > impure10_test
> >>     1
> >>     2
> >>     3
> >>     4
> >>     5
> >>     <<Intermission>>
> >>     6
> >>     7
> >>     8
> >>     9
> >>     10
> >>
> >> If we call 'resumable' on a 'Producer' that already has resumable
> >> behavior -- it will still work. We can simulate that by calling
> resumable
> >> twice:
> >>
> >> > twice_resumable :: IO ()
> >> > twice_resumable =
> >> >     do p0 <- resumable pure10
> >> >        p  <- resumable p0
> >> >        take5_twice p
> >>
> >>
> >>     > twice_resumable
> >>     1
> >>     2
> >>     3
> >>     4
> >>     5
> >>     <<Intermission>>
> >>     6
> >>     7
> >>     8
> >>     9
> >>     10
> >>
> >> Of course, we now have the overhead of *two* 'IORef' based Producers.
> >>
> >> So we are now left with some questions of style.
> >>
> >> If we are writing something like an HTTP server -- we can assume that
> >> most of the time we are going to working with a 'Producer' based on a
> >> resumable source like a network stream. So, by using the inherent
> >> resumability we can presumably get lower overhead and higher
> >> performance. If we need to use the code with a non-resumable Producer
> >> then we can use a function like 'resumable' to fake it.
> >>
> >> This is somewhat distasteful in two ways though.
> >>
> >>  (1) It forces everything to be in the IO monad -- even when
> >>      everything could actually be pure.
> >>
> >>  (2) it relies on the resumability of the Producer -- but there is no
> >>      enforcement or indication of that in the type system.
> >>
> >> In some sense -- being in the IO monad is not really a big deal since
> any
> >> practical web server needs to be anyway. On the other hand -- creating
> >> a nice pure streaming abstraction and sticking an ugly IORef in it seems
> >> a little sad.
> >>
> >> The alternative is to run all our code inside a 'StateT'. Since the
> >> 'StateT' takes care of resuming we do not have to worry if the
> >> underlying Producer does or not. But.. now we always have the overhead
> >> of being inside a 'StateT' even we don't really need to be -- so we
> >> have a more complicated set of types to work with and more potential
> >> overhead.
> >>
> >> The upside is that our pure code stays pure. We only introduce the IO
> >> monad when IO is really used.
> >>
> >> This is the major decision blocking hyperdrive at the
> >> moment. (hyperdrive is my pipes based HTTP server).
> >>
> >> Any thoughts?
> >>
> >> - jeremy
> >>
> >> --
> >> You received this message because you are subscribed to the Google
> Groups
> >> "Haskell Pipes" group.
> >> To unsubscribe from this group and stop receiving emails from it, send
> an
> >> email to [email protected].
> >> To post to this group, send email to [email protected].
> >
> >
> > --
> > You received this message because you are subscribed to the Google Groups
> > "Haskell Pipes" group.
> > To unsubscribe from this group and stop receiving emails from it, send an
> > email to [email protected].
> > To post to this group, send email to [email protected].
>
> --
> You received this message because you are subscribed to the Google Groups
> "Haskell Pipes" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To post to this group, send email to [email protected].
>

-- 
You received this message because you are subscribed to the Google Groups 
"Haskell Pipes" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].

Re: [haskell-pipes] Producers, Purity and Resumability

Reply via email to