Sorry for the delay responding to this. I just needed some time to think about what the appropriate API should be that would resolve many of the issues that Jeremy raised in the last thread on this subject.

I think a lot of these concurrency issues that Jeremy Shaw raised in the last thread on this subject can be handled by using `Input`s and `Output`s from `pipes-concurrency` instead of `Producer`s and `Consumer`s.

For example, let's say that within the callback our `stdout` and `stderr` have the following types:

    stdout :: Input ByteString
    stderr :: Input ByteString

Then it's easy to merge the two streams and preserve their relative ordering by using the `Alternative` instance for `Input`s:

    stdBoth :: Input (Either ByteString ByteString)
    stdBoth = fmap Left stderr <|> fmap Right stdout

The other advantage of this `pipes-concurrency` approach to modeling the handles is that the user can pass a `Buffer` specifying how to handle buffering between the process and the Haskell program. This allows the user to tune how much input to buffer before the process should block, using `Unbounded` or `Bounded` buffers, for example.

I will try to write up a sketch of what I have in mind.

On 02/17/2014 05:10 AM, Daniel Díaz wrote:
I recently had to work with Ruby's 'Open3' package, and that got me thinking about this thread again.

I have cobbled together a few helper functions and wrappers over System.Process that implement some of the ideas floated in the thread. Ideas like avoiding deadlock by reading continuously from the handles and buffering the results in memory. I've also tried to avoid throwing exceptions, making errors explicit in the type signatures.

The repo is at https://github.com/danidiaz/process-streaming

and some exampes at: https://github.com/danidiaz/process-streaming/blob/milestone2/examples/Main.hs

-- stdout and stderr to different files, using pipes-safe.
example1 :: IO (Either String ((),()))
example1 = ec show $
   execute2 (proc "script1.bat" [])
            show
            (consume "stdout.log")
            (consume "stderr.log")
   where
   consume file = surely . safely . useConsumer $
                      S.withFile file WriteMode toHandle

The code is not exactly well tested, I must say.

Any comments or suggestions welcome!

On Wednesday, December 4, 2013 7:27:16 PM UTC+1, Gabriel Gonzalez wrote:

    If you want to keep the buffers in memory, this is exactly what
    `pipes-concurrency` does.  Just use `spawn` to create a buffer
    that you can write to and read from at your leisure.  It lets you
    specify a bounded or unlimited buffer size.

    This will also make sure that consumers of the buffers properly
    wait for more input when they exhaust the buffer and terminate
    when the buffer is done.  You don't need to keep track of the
    number of bytes written to the buffer.

    I'm not certain this is the best approach, yet, because I haven't
    had time to think about this yet, but I just wanted to mention
    this potential solution to what you just described.

    On 12/04/2013 04:23 PM, Daniel Díaz wrote:
    To avoid the possibility of filling the output buffers and
    blocking the process, while still keeping separate stdout and
    stderr producers, perhaps two temporary files could be created.
    Stdout would be written to one and stderr to the other. Clients
    would read the temporary files as they are being written, but
    would always block before reaching the "not yet written" zone (we
    would ensure this by keeping track of the number of bytes written
    to each file.)

    Or perhaps these intermediate buffers could be kept in memory, if
    they didn't grew too big.

    Could this work?

    On Wednesday, September 25, 2013 5:48:16 AM UTC+2, Jeremy Shaw
    wrote:

        On Tue, Sep 24, 2013 at 1:46 PM, John Wiegley
        <[email protected]> wrote:

            >>>>> Gabriel Gonzalez <[email protected]> writes:

            >     readProcess :: Process -> Producer (Either
            ByteString ByteString) (SafeT
            > m) ()

            Wouldn't it be better to give two Producers, one for
            stdout and one for stdin?
            They be written two at the same time by the process,
            can't they?  It would
            then seem odd that they can only be processed in sequence.


        I assume you mean one for stdout and one for *stderr*?

        Alas, the unix process model is so fundamentally stupid that
        I think we really need both variants. Many command-line apps
        are run from the command-line where stdout and stderr are
        interleaved in a somewhat arbitrary manner. But, there is
        some time-based information there -- even if there is a bit
        of fuzziness. For example, an app could print several lines
        of success to stdout, some error message to stderr, and more
        success to stdout. So, the stuff on stderr is presented in
        the context of what happened around the same time on stdout.

        If you treat them as two completely independent sources, then
        you lose that temporal context.

        So, I think it is useful to have a version that does
        interleave the stdout/stderr in whatever order it seems to
        get them. In theory you can just use partitionEithers to
        separate them if you don't want them interleaved like that.
        But that is not always the most convenient thing to do. It's
        clear that there are times when it seems like having stdout
        and stderr be separate Producers would be the most convenient
        solution.

        On the other hand -- I think there is a real danger to have
        two Producers, one for stdout and one for stderr. Let's say
        you only care about stdout and you don't do anything with
        stderr. Since you are ignoring it, nobody is reading from
        stderr and now stderr is at risk at blocking due to having a
        full output buffer, and the whole process may then block.
        Even worse, maybe you do care about stdout and stderr, but
        you try do something where you first write all of stdout to a
        file, and then all of stderr. You could still end up blocked.
        If you want to safely process stdout and stderr separately,
        then I think you must do that in separate threads so that you
        don't deadlock?

        I think it is necessary that we always read data from stdout
        and stderr when it becomes available, though we can choose to
        discard one or the other if we don't actually want it.

        Now, we should also note that a similar problem exists in the
        current code. If we start the process and use only
        writeProcess, but not readProcess, then the process might
        block trying to write output and the input will never get read.

        So modeling a process as a Pipe does not work, but modelling
        it an independent Consumer and Producer is not entirely
        correct either. There is, in fact, some interaction between
        the Consumer and Producer ends of a process -- but not in a
        way that we can really reason about it?

        still.. I feel like allow the user to read only stdout or
        only stderr is asking for more trouble than allow the user to
        call only readProcess vs only writeProcess.

        Unfortunately, it is extremely easy to deadlock when calling
        a unix process that streams both inputs and outputs. I wonder
        if there is another way we can wrap a process into a pipe
        that is safer?

        - jeremy




-- You received this message because you are subscribed to the
    Google Groups "Haskell Pipes" group.
    To unsubscribe from this group and stop receiving emails from it,
    send an email to [email protected] <javascript:>.
    To post to this group, send email to [email protected]
    <javascript:>.

--
You received this message because you are subscribed to the Google Groups "Haskell Pipes" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected].
To post to this group, send email to [email protected].

--
You received this message because you are subscribed to the Google Groups "Haskell 
Pipes" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].

Reply via email to