Alright, I wrote up what I had in mind and you can find my draft here:
https://github.com/Gabriel439/pipes-process
On 02/17/2014 05:10 AM, Daniel Díaz wrote:
I recently had to work with Ruby's 'Open3' package, and that got me
thinking about this thread again.
I have cobbled together a few helper functions and wrappers over
System.Process that implement some of the ideas floated in the thread.
Ideas like avoiding deadlock by reading continuously from the handles
and buffering the results in memory. I've also tried to avoid
throwing exceptions, making errors explicit in the type signatures.
The repo is at https://github.com/danidiaz/process-streaming
and some exampes at:
https://github.com/danidiaz/process-streaming/blob/milestone2/examples/Main.hs
-- stdout and stderr to different files, using pipes-safe.
example1 :: IO (Either String ((),()))
example1 = ec show $
execute2 (proc "script1.bat" [])
show
(consume "stdout.log")
(consume "stderr.log")
where
consume file = surely . safely . useConsumer $
S.withFile file WriteMode toHandle
The code is not exactly well tested, I must say.
Any comments or suggestions welcome!
On Wednesday, December 4, 2013 7:27:16 PM UTC+1, Gabriel Gonzalez wrote:
If you want to keep the buffers in memory, this is exactly what
`pipes-concurrency` does. Just use `spawn` to create a buffer
that you can write to and read from at your leisure. It lets you
specify a bounded or unlimited buffer size.
This will also make sure that consumers of the buffers properly
wait for more input when they exhaust the buffer and terminate
when the buffer is done. You don't need to keep track of the
number of bytes written to the buffer.
I'm not certain this is the best approach, yet, because I haven't
had time to think about this yet, but I just wanted to mention
this potential solution to what you just described.
On 12/04/2013 04:23 PM, Daniel Díaz wrote:
To avoid the possibility of filling the output buffers and
blocking the process, while still keeping separate stdout and
stderr producers, perhaps two temporary files could be created.
Stdout would be written to one and stderr to the other. Clients
would read the temporary files as they are being written, but
would always block before reaching the "not yet written" zone (we
would ensure this by keeping track of the number of bytes written
to each file.)
Or perhaps these intermediate buffers could be kept in memory, if
they didn't grew too big.
Could this work?
On Wednesday, September 25, 2013 5:48:16 AM UTC+2, Jeremy Shaw
wrote:
On Tue, Sep 24, 2013 at 1:46 PM, John Wiegley
<[email protected]> wrote:
>>>>> Gabriel Gonzalez <[email protected]> writes:
> readProcess :: Process -> Producer (Either
ByteString ByteString) (SafeT
> m) ()
Wouldn't it be better to give two Producers, one for
stdout and one for stdin?
They be written two at the same time by the process,
can't they? It would
then seem odd that they can only be processed in sequence.
I assume you mean one for stdout and one for *stderr*?
Alas, the unix process model is so fundamentally stupid that
I think we really need both variants. Many command-line apps
are run from the command-line where stdout and stderr are
interleaved in a somewhat arbitrary manner. But, there is
some time-based information there -- even if there is a bit
of fuzziness. For example, an app could print several lines
of success to stdout, some error message to stderr, and more
success to stdout. So, the stuff on stderr is presented in
the context of what happened around the same time on stdout.
If you treat them as two completely independent sources, then
you lose that temporal context.
So, I think it is useful to have a version that does
interleave the stdout/stderr in whatever order it seems to
get them. In theory you can just use partitionEithers to
separate them if you don't want them interleaved like that.
But that is not always the most convenient thing to do. It's
clear that there are times when it seems like having stdout
and stderr be separate Producers would be the most convenient
solution.
On the other hand -- I think there is a real danger to have
two Producers, one for stdout and one for stderr. Let's say
you only care about stdout and you don't do anything with
stderr. Since you are ignoring it, nobody is reading from
stderr and now stderr is at risk at blocking due to having a
full output buffer, and the whole process may then block.
Even worse, maybe you do care about stdout and stderr, but
you try do something where you first write all of stdout to a
file, and then all of stderr. You could still end up blocked.
If you want to safely process stdout and stderr separately,
then I think you must do that in separate threads so that you
don't deadlock?
I think it is necessary that we always read data from stdout
and stderr when it becomes available, though we can choose to
discard one or the other if we don't actually want it.
Now, we should also note that a similar problem exists in the
current code. If we start the process and use only
writeProcess, but not readProcess, then the process might
block trying to write output and the input will never get read.
So modeling a process as a Pipe does not work, but modelling
it an independent Consumer and Producer is not entirely
correct either. There is, in fact, some interaction between
the Consumer and Producer ends of a process -- but not in a
way that we can really reason about it?
still.. I feel like allow the user to read only stdout or
only stderr is asking for more trouble than allow the user to
call only readProcess vs only writeProcess.
Unfortunately, it is extremely easy to deadlock when calling
a unix process that streams both inputs and outputs. I wonder
if there is another way we can wrap a process into a pipe
that is safer?
- jeremy
--
You received this message because you are subscribed to the
Google Groups "Haskell Pipes" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to [email protected] <javascript:>.
To post to this group, send email to [email protected]
<javascript:>.
--
You received this message because you are subscribed to the Google
Groups "Haskell Pipes" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to [email protected].
To post to this group, send email to [email protected].
--
You received this message because you are subscribed to the Google Groups "Haskell
Pipes" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To post to this group, send email to [email protected].