(Sorry if this is too old a thread to resucitate.)

I have kept working in my process-streaming package. Previous versions had 
an ugly API, hopefully the new one (0.5.0.x) is a bit more intuitive. I 
would like some feedback on that: 
http://hackage.haskell.org/package/process-streaming

To avoid the "one of the standard streams is not drained and causes a 
deadlock" issue, I defined a "Siphon" type that represents a computation 
that always completely drains a Producer. stdout and stderr can only be 
consumed through Siphons.

Consuming stdout and stderr combined in the same stream is supported. 
There's also support for a (limited) form of process pipelines.


On Sunday, February 23, 2014 3:14:30 AM UTC+1, Gabriel Gonzalez wrote:
>
>  Alright, I wrote up what I had in mind and you can find my draft here:
>
> https://github.com/Gabriel439/pipes-process
>
> On 02/17/2014 05:10 AM, Daniel Díaz wrote:
>  
>  I recently had to work with Ruby's 'Open3' package, and that got me 
> thinking about this thread again. 
>
>  I have cobbled together a few helper functions and wrappers over 
> System.Process that implement some of the ideas floated in the thread. 
> Ideas like avoiding deadlock by reading continuously from the handles and 
> buffering the results in memory. I've  also tried to avoid throwing 
> exceptions, making errors explicit in the type signatures. 
>
>  The repo is at https://github.com/danidiaz/process-streaming 
>
>  and some exampes at: 
> https://github.com/danidiaz/process-streaming/blob/milestone2/examples/Main.hs
>
>  -- stdout and stderr to different files, using pipes-safe.
>  example1 :: IO (Either String ((),()))
>  example1 = ec show $
>     execute2 (proc "script1.bat" [])
>              show  
>              (consume "stdout.log")
>              (consume "stderr.log")
>     where
>     consume file = surely . safely . useConsumer $
>                        S.withFile file WriteMode toHandle
>
>  The code is not exactly well tested, I must say.
>
>  Any comments or suggestions welcome!
>
> On Wednesday, December 4, 2013 7:27:16 PM UTC+1, Gabriel Gonzalez wrote: 
>>
>>  If you want to keep the buffers in memory, this is exactly what 
>> `pipes-concurrency` does.  Just use `spawn` to create a buffer that you can 
>> write to and read from at your leisure.  It lets you specify a bounded or 
>> unlimited buffer size.
>>
>> This will also make sure that consumers of the buffers properly wait for 
>> more input when they exhaust the buffer and terminate when the buffer is 
>> done.  You don't need to keep track of the number of bytes written to the 
>> buffer.
>>
>> I'm not certain this is the best approach, yet, because I haven't had 
>> time to think about this yet, but I just wanted to mention this potential 
>> solution to what you just described.
>>
>> On 12/04/2013 04:23 PM, Daniel Díaz wrote:
>>  
>> To avoid the possibility of filling the output buffers and blocking the 
>> process, while still keeping separate stdout and stderr producers, perhaps 
>> two temporary files could be created. Stdout would be written to one and 
>> stderr to the other. Clients would read the temporary files as they are 
>> being written, but would always block before reaching the "not yet written" 
>> zone (we would ensure this by keeping track of the number of bytes written 
>> to each file.)
>>
>> Or perhaps these intermediate buffers could be kept in memory, if they 
>> didn't grew too big.
>>
>> Could this work?
>>
>> On Wednesday, September 25, 2013 5:48:16 AM UTC+2, Jeremy Shaw wrote: 
>>>
>>> On Tue, Sep 24, 2013 at 1:46 PM, John Wiegley <[email protected]> 
>>> wrote:
>>>  
>>>> >>>>> Gabriel Gonzalez <[email protected]> writes:
>>>>
>>>> >     readProcess :: Process -> Producer (Either ByteString ByteString) 
>>>> (SafeT
>>>> > m) ()
>>>>
>>>>  Wouldn't it be better to give two Producers, one for stdout and one 
>>>> for stdin?
>>>> They be written two at the same time by the process, can't they?  It 
>>>> would
>>>> then seem odd that they can only be processed in sequence.
>>>>
>>>
>>>  I assume you mean one for stdout and one for *stderr*?
>>>
>>>  Alas, the unix process model is so fundamentally stupid that I think 
>>> we really need both variants. Many command-line apps are run from the 
>>> command-line where stdout and stderr are interleaved in a somewhat 
>>> arbitrary manner. But, there is some time-based information there -- even 
>>> if there is a bit of fuzziness. For example, an app could print several 
>>> lines of success to stdout, some error message to stderr, and more success 
>>> to stdout. So, the stuff on stderr is presented in the context of what 
>>> happened around the same time on stdout.
>>>
>>>  If you treat them as two completely independent sources, then you lose 
>>> that temporal context.
>>>
>>>  So, I think it is useful to have a version that does interleave the 
>>> stdout/stderr in whatever order it seems to get them. In theory you can 
>>> just use partitionEithers to separate them if you don't want them 
>>> interleaved like that. But that is not always the most convenient thing to 
>>> do. It's clear that there are times when it seems like having stdout and 
>>> stderr be separate Producers would be the most convenient solution.
>>>
>>>  On the other hand -- I think there is a real danger to have two 
>>> Producers, one for stdout and one for stderr. Let's say you only care about 
>>> stdout and you don't do anything with stderr. Since you are ignoring it, 
>>> nobody is reading from stderr and now stderr is at risk at blocking due to 
>>> having a full output buffer, and the whole process may then block. Even 
>>> worse, maybe you do care about stdout and stderr, but you try do something 
>>> where you first write all of stdout to a file, and then all of stderr. You 
>>> could still end up blocked. If you want to safely process stdout and stderr 
>>> separately, then I think you must do that in separate threads so that you 
>>> don't deadlock?
>>>
>>>  I think it is necessary that we always read data from stdout and 
>>> stderr when it becomes available, though we can choose to discard one or 
>>> the other if we don't actually want it.
>>>
>>>  Now, we should also note that a similar problem exists in the current 
>>> code. If we start the process and use only writeProcess, but not 
>>> readProcess, then the process might block trying to write output and the 
>>> input will never get read.
>>>
>>>  So modeling a process as a Pipe does not work, but modelling it an 
>>> independent Consumer and Producer is not entirely correct either. There is, 
>>> in fact, some interaction between the Consumer and Producer ends of a 
>>> process -- but not in a way that we can really reason about it?
>>>
>>>  still.. I feel like allow the user to read only stdout or only stderr 
>>> is asking for more trouble than allow the user to call only readProcess vs 
>>> only writeProcess.
>>>
>>>  Unfortunately, it is extremely easy to deadlock when calling a unix 
>>> process that streams both inputs and outputs. I wonder if there is another 
>>> way we can wrap a process into a pipe that is safer?
>>>
>>>  - jeremy
>>>
>>>  
>>>  
>>>  
>>>    -- 
>> You received this message because you are subscribed to the Google Groups 
>> "Haskell Pipes" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to [email protected].
>> To post to this group, send email to [email protected].
>>
>>
>>   -- 
> You received this message because you are subscribed to the Google Groups 
> "Haskell Pipes" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to [email protected] <javascript:>.
> To post to this group, send email to [email protected] 
> <javascript:>.
>
>
> 

-- 
You received this message because you are subscribed to the Google Groups 
"Haskell Pipes" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].

Reply via email to