I recently had to work with Ruby's 'Open3' package, and that got me 
thinking about this thread again. 

I have cobbled together a few helper functions and wrappers over 
System.Process that implement some of the ideas floated in the thread. 
Ideas like avoiding deadlock by reading continuously from the handles and 
buffering the results in memory. I've  also tried to avoid throwing 
exceptions, making errors explicit in the type signatures. 

The repo is at https://github.com/danidiaz/process-streaming 

and some exampes at: 
https://github.com/danidiaz/process-streaming/blob/milestone2/examples/Main.hs

-- stdout and stderr to different files, using pipes-safe.
example1 :: IO (Either String ((),()))
example1 = ec show $
    execute2 (proc "script1.bat" [])
             show  
             (consume "stdout.log")
             (consume "stderr.log")
    where
    consume file = surely . safely . useConsumer $
                       S.withFile file WriteMode toHandle

The code is not exactly well tested, I must say.

Any comments or suggestions welcome!

On Wednesday, December 4, 2013 7:27:16 PM UTC+1, Gabriel Gonzalez wrote:
>
>  If you want to keep the buffers in memory, this is exactly what 
> `pipes-concurrency` does.  Just use `spawn` to create a buffer that you can 
> write to and read from at your leisure.  It lets you specify a bounded or 
> unlimited buffer size.
>
> This will also make sure that consumers of the buffers properly wait for 
> more input when they exhaust the buffer and terminate when the buffer is 
> done.  You don't need to keep track of the number of bytes written to the 
> buffer.
>
> I'm not certain this is the best approach, yet, because I haven't had time 
> to think about this yet, but I just wanted to mention this potential 
> solution to what you just described.
>
> On 12/04/2013 04:23 PM, Daniel Díaz wrote:
>  
> To avoid the possibility of filling the output buffers and blocking the 
> process, while still keeping separate stdout and stderr producers, perhaps 
> two temporary files could be created. Stdout would be written to one and 
> stderr to the other. Clients would read the temporary files as they are 
> being written, but would always block before reaching the "not yet written" 
> zone (we would ensure this by keeping track of the number of bytes written 
> to each file.)
>
> Or perhaps these intermediate buffers could be kept in memory, if they 
> didn't grew too big.
>
> Could this work?
>
> On Wednesday, September 25, 2013 5:48:16 AM UTC+2, Jeremy Shaw wrote: 
>>
>> On Tue, Sep 24, 2013 at 1:46 PM, John Wiegley <[email protected]>wrote:
>>  
>>> >>>>> Gabriel Gonzalez <[email protected]> writes:
>>>
>>> >     readProcess :: Process -> Producer (Either ByteString ByteString) 
>>> (SafeT
>>> > m) ()
>>>
>>>  Wouldn't it be better to give two Producers, one for stdout and one for 
>>> stdin?
>>> They be written two at the same time by the process, can't they?  It 
>>> would
>>> then seem odd that they can only be processed in sequence.
>>>
>>
>>  I assume you mean one for stdout and one for *stderr*?
>>
>>  Alas, the unix process model is so fundamentally stupid that I think we 
>> really need both variants. Many command-line apps are run from the 
>> command-line where stdout and stderr are interleaved in a somewhat 
>> arbitrary manner. But, there is some time-based information there -- even 
>> if there is a bit of fuzziness. For example, an app could print several 
>> lines of success to stdout, some error message to stderr, and more success 
>> to stdout. So, the stuff on stderr is presented in the context of what 
>> happened around the same time on stdout.
>>
>>  If you treat them as two completely independent sources, then you lose 
>> that temporal context.
>>
>>  So, I think it is useful to have a version that does interleave the 
>> stdout/stderr in whatever order it seems to get them. In theory you can 
>> just use partitionEithers to separate them if you don't want them 
>> interleaved like that. But that is not always the most convenient thing to 
>> do. It's clear that there are times when it seems like having stdout and 
>> stderr be separate Producers would be the most convenient solution.
>>
>>  On the other hand -- I think there is a real danger to have two 
>> Producers, one for stdout and one for stderr. Let's say you only care about 
>> stdout and you don't do anything with stderr. Since you are ignoring it, 
>> nobody is reading from stderr and now stderr is at risk at blocking due to 
>> having a full output buffer, and the whole process may then block. Even 
>> worse, maybe you do care about stdout and stderr, but you try do something 
>> where you first write all of stdout to a file, and then all of stderr. You 
>> could still end up blocked. If you want to safely process stdout and stderr 
>> separately, then I think you must do that in separate threads so that you 
>> don't deadlock?
>>
>>  I think it is necessary that we always read data from stdout and stderr 
>> when it becomes available, though we can choose to discard one or the other 
>> if we don't actually want it.
>>
>>  Now, we should also note that a similar problem exists in the current 
>> code. If we start the process and use only writeProcess, but not 
>> readProcess, then the process might block trying to write output and the 
>> input will never get read.
>>
>>  So modeling a process as a Pipe does not work, but modelling it an 
>> independent Consumer and Producer is not entirely correct either. There is, 
>> in fact, some interaction between the Consumer and Producer ends of a 
>> process -- but not in a way that we can really reason about it?
>>
>>  still.. I feel like allow the user to read only stdout or only stderr 
>> is asking for more trouble than allow the user to call only readProcess vs 
>> only writeProcess.
>>
>>  Unfortunately, it is extremely easy to deadlock when calling a unix 
>> process that streams both inputs and outputs. I wonder if there is another 
>> way we can wrap a process into a pipe that is safer?
>>
>>  - jeremy
>>
>>  
>>  
>>  
>>    -- 
> You received this message because you are subscribed to the Google Groups 
> "Haskell Pipes" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to [email protected] <javascript:>.
> To post to this group, send email to [email protected]<javascript:>
> .
>
>
> 

-- 
You received this message because you are subscribed to the Google Groups 
"Haskell Pipes" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].

Reply via email to