(Sorry if this is too old a thread to resucitate.) I have kept working in my process-streaming package. Previous versions had an ugly API, hopefully the new one (0.5.0.x) is a bit more intuitive. I would like some feedback on that: http://hackage.haskell.org/package/process-streaming
To avoid the "one of the standard streams is not drained and causes a deadlock" issue, I defined a "Siphon" type that represents a computation that always completely drains a Producer. stdout and stderr can only be consumed through Siphons. Consuming stdout and stderr combined in the same stream is supported. There's also support for a (limited) form of process pipelines. On Sunday, February 23, 2014 3:14:30 AM UTC+1, Gabriel Gonzalez wrote: > > Alright, I wrote up what I had in mind and you can find my draft here: > > https://github.com/Gabriel439/pipes-process > > On 02/17/2014 05:10 AM, Daniel Díaz wrote: > > I recently had to work with Ruby's 'Open3' package, and that got me > thinking about this thread again. > > I have cobbled together a few helper functions and wrappers over > System.Process that implement some of the ideas floated in the thread. > Ideas like avoiding deadlock by reading continuously from the handles and > buffering the results in memory. I've also tried to avoid throwing > exceptions, making errors explicit in the type signatures. > > The repo is at https://github.com/danidiaz/process-streaming > > and some exampes at: > https://github.com/danidiaz/process-streaming/blob/milestone2/examples/Main.hs > > -- stdout and stderr to different files, using pipes-safe. > example1 :: IO (Either String ((),())) > example1 = ec show $ > execute2 (proc "script1.bat" []) > show > (consume "stdout.log") > (consume "stderr.log") > where > consume file = surely . safely . useConsumer $ > S.withFile file WriteMode toHandle > > The code is not exactly well tested, I must say. > > Any comments or suggestions welcome! > > On Wednesday, December 4, 2013 7:27:16 PM UTC+1, Gabriel Gonzalez wrote: >> >> If you want to keep the buffers in memory, this is exactly what >> `pipes-concurrency` does. Just use `spawn` to create a buffer that you can >> write to and read from at your leisure. It lets you specify a bounded or >> unlimited buffer size. >> >> This will also make sure that consumers of the buffers properly wait for >> more input when they exhaust the buffer and terminate when the buffer is >> done. You don't need to keep track of the number of bytes written to the >> buffer. >> >> I'm not certain this is the best approach, yet, because I haven't had >> time to think about this yet, but I just wanted to mention this potential >> solution to what you just described. >> >> On 12/04/2013 04:23 PM, Daniel Díaz wrote: >> >> To avoid the possibility of filling the output buffers and blocking the >> process, while still keeping separate stdout and stderr producers, perhaps >> two temporary files could be created. Stdout would be written to one and >> stderr to the other. Clients would read the temporary files as they are >> being written, but would always block before reaching the "not yet written" >> zone (we would ensure this by keeping track of the number of bytes written >> to each file.) >> >> Or perhaps these intermediate buffers could be kept in memory, if they >> didn't grew too big. >> >> Could this work? >> >> On Wednesday, September 25, 2013 5:48:16 AM UTC+2, Jeremy Shaw wrote: >>> >>> On Tue, Sep 24, 2013 at 1:46 PM, John Wiegley <[email protected]> >>> wrote: >>> >>>> >>>>> Gabriel Gonzalez <[email protected]> writes: >>>> >>>> > readProcess :: Process -> Producer (Either ByteString ByteString) >>>> (SafeT >>>> > m) () >>>> >>>> Wouldn't it be better to give two Producers, one for stdout and one >>>> for stdin? >>>> They be written two at the same time by the process, can't they? It >>>> would >>>> then seem odd that they can only be processed in sequence. >>>> >>> >>> I assume you mean one for stdout and one for *stderr*? >>> >>> Alas, the unix process model is so fundamentally stupid that I think >>> we really need both variants. Many command-line apps are run from the >>> command-line where stdout and stderr are interleaved in a somewhat >>> arbitrary manner. But, there is some time-based information there -- even >>> if there is a bit of fuzziness. For example, an app could print several >>> lines of success to stdout, some error message to stderr, and more success >>> to stdout. So, the stuff on stderr is presented in the context of what >>> happened around the same time on stdout. >>> >>> If you treat them as two completely independent sources, then you lose >>> that temporal context. >>> >>> So, I think it is useful to have a version that does interleave the >>> stdout/stderr in whatever order it seems to get them. In theory you can >>> just use partitionEithers to separate them if you don't want them >>> interleaved like that. But that is not always the most convenient thing to >>> do. It's clear that there are times when it seems like having stdout and >>> stderr be separate Producers would be the most convenient solution. >>> >>> On the other hand -- I think there is a real danger to have two >>> Producers, one for stdout and one for stderr. Let's say you only care about >>> stdout and you don't do anything with stderr. Since you are ignoring it, >>> nobody is reading from stderr and now stderr is at risk at blocking due to >>> having a full output buffer, and the whole process may then block. Even >>> worse, maybe you do care about stdout and stderr, but you try do something >>> where you first write all of stdout to a file, and then all of stderr. You >>> could still end up blocked. If you want to safely process stdout and stderr >>> separately, then I think you must do that in separate threads so that you >>> don't deadlock? >>> >>> I think it is necessary that we always read data from stdout and >>> stderr when it becomes available, though we can choose to discard one or >>> the other if we don't actually want it. >>> >>> Now, we should also note that a similar problem exists in the current >>> code. If we start the process and use only writeProcess, but not >>> readProcess, then the process might block trying to write output and the >>> input will never get read. >>> >>> So modeling a process as a Pipe does not work, but modelling it an >>> independent Consumer and Producer is not entirely correct either. There is, >>> in fact, some interaction between the Consumer and Producer ends of a >>> process -- but not in a way that we can really reason about it? >>> >>> still.. I feel like allow the user to read only stdout or only stderr >>> is asking for more trouble than allow the user to call only readProcess vs >>> only writeProcess. >>> >>> Unfortunately, it is extremely easy to deadlock when calling a unix >>> process that streams both inputs and outputs. I wonder if there is another >>> way we can wrap a process into a pipe that is safer? >>> >>> - jeremy >>> >>> >>> >>> >>> -- >> You received this message because you are subscribed to the Google Groups >> "Haskell Pipes" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to [email protected]. >> To post to this group, send email to [email protected]. >> >> >> -- > You received this message because you are subscribed to the Google Groups > "Haskell Pipes" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected] <javascript:>. > To post to this group, send email to [email protected] > <javascript:>. > > > -- You received this message because you are subscribed to the Google Groups "Haskell Pipes" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected].
