I recently had to work with Ruby's 'Open3' package, and that got
me thinking about this thread again.
I have cobbled together a few helper functions and wrappers over
System.Process that implement some of the ideas floated in the
thread. Ideas like avoiding deadlock by reading continuously from
the handles and buffering the results in memory. I've also tried
to avoid throwing exceptions, making errors explicit in the type
signatures.
The repo is at https://github.com/danidiaz/process-streaming
and some exampes at:
https://github.com/danidiaz/process-streaming/blob/milestone2/examples/Main.hs
-- stdout and stderr to different files, using pipes-safe.
example1 :: IO (Either String ((),()))
example1 = ec show $
execute2 (proc "script1.bat" [])
show
(consume "stdout.log")
(consume "stderr.log")
where
consume file = surely . safely . useConsumer $
S.withFile file WriteMode toHandle
The code is not exactly well tested, I must say.
Any comments or suggestions welcome!
On Wednesday, December 4, 2013 7:27:16 PM UTC+1, Gabriel Gonzalez
wrote:
If you want to keep the buffers in memory, this is exactly
what `pipes-concurrency` does. Just use `spawn` to create a
buffer that you can write to and read from at your leisure.
It lets you specify a bounded or unlimited buffer size.
This will also make sure that consumers of the buffers
properly wait for more input when they exhaust the buffer and
terminate when the buffer is done. You don't need to keep
track of the number of bytes written to the buffer.
I'm not certain this is the best approach, yet, because I
haven't had time to think about this yet, but I just wanted
to mention this potential solution to what you just described.
On 12/04/2013 04:23 PM, Daniel Díaz wrote:
To avoid the possibility of filling the output buffers and
blocking the process, while still keeping separate stdout
and stderr producers, perhaps two temporary files could be
created. Stdout would be written to one and stderr to the
other. Clients would read the temporary files as they are
being written, but would always block before reaching the
"not yet written" zone (we would ensure this by keeping
track of the number of bytes written to each file.)
Or perhaps these intermediate buffers could be kept in
memory, if they didn't grew too big.
Could this work?
On Wednesday, September 25, 2013 5:48:16 AM UTC+2, Jeremy
Shaw wrote:
On Tue, Sep 24, 2013 at 1:46 PM, John Wiegley
<[email protected]> wrote:
>>>>> Gabriel Gonzalez <[email protected]> writes:
> readProcess :: Process -> Producer (Either
ByteString ByteString) (SafeT
> m) ()
Wouldn't it be better to give two Producers, one for
stdout and one for stdin?
They be written two at the same time by the process,
can't they? It would
then seem odd that they can only be processed in
sequence.
I assume you mean one for stdout and one for *stderr*?
Alas, the unix process model is so fundamentally stupid
that I think we really need both variants. Many
command-line apps are run from the command-line where
stdout and stderr are interleaved in a somewhat
arbitrary manner. But, there is some time-based
information there -- even if there is a bit of
fuzziness. For example, an app could print several lines
of success to stdout, some error message to stderr, and
more success to stdout. So, the stuff on stderr is
presented in the context of what happened around the
same time on stdout.
If you treat them as two completely independent sources,
then you lose that temporal context.
So, I think it is useful to have a version that does
interleave the stdout/stderr in whatever order it seems
to get them. In theory you can just use partitionEithers
to separate them if you don't want them interleaved like
that. But that is not always the most convenient thing
to do. It's clear that there are times when it seems
like having stdout and stderr be separate Producers
would be the most convenient solution.
On the other hand -- I think there is a real danger to
have two Producers, one for stdout and one for stderr.
Let's say you only care about stdout and you don't do
anything with stderr. Since you are ignoring it, nobody
is reading from stderr and now stderr is at risk at
blocking due to having a full output buffer, and the
whole process may then block. Even worse, maybe you do
care about stdout and stderr, but you try do something
where you first write all of stdout to a file, and then
all of stderr. You could still end up blocked. If you
want to safely process stdout and stderr separately,
then I think you must do that in separate threads so
that you don't deadlock?
I think it is necessary that we always read data from
stdout and stderr when it becomes available, though we
can choose to discard one or the other if we don't
actually want it.
Now, we should also note that a similar problem exists
in the current code. If we start the process and use
only writeProcess, but not readProcess, then the process
might block trying to write output and the input will
never get read.
So modeling a process as a Pipe does not work, but
modelling it an independent Consumer and Producer is not
entirely correct either. There is, in fact, some
interaction between the Consumer and Producer ends of a
process -- but not in a way that we can really reason
about it?
still.. I feel like allow the user to read only stdout
or only stderr is asking for more trouble than allow the
user to call only readProcess vs only writeProcess.
Unfortunately, it is extremely easy to deadlock when
calling a unix process that streams both inputs and
outputs. I wonder if there is another way we can wrap a
process into a pipe that is safer?
- jeremy
--
You received this message because you are subscribed to the
Google Groups "Haskell Pipes" group.
To unsubscribe from this group and stop receiving emails
from it, send an email to [email protected].
To post to this group, send email to [email protected].
--
You received this message because you are subscribed to the
Google Groups "Haskell Pipes" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to [email protected]
<mailto:[email protected]>.
To post to this group, send email to
[email protected]
<mailto:[email protected]>.