> On Nov 11, 2015, at 12:44 AM, Alexander Rojas <alexan...@mesosphere.io> wrote:
> 
> What I meant is that we may not care about SIGPIPE (which tell us a pipe was 
> broken) because we will be notified when we try to write into it anyway (on 
> the writing side) and we will get an EOF on the reading side.
> 
> The only thing I could see us caring about SIGPIPE is if we want to know as 
> soon as the pipe breaks that the event happened.

So it sounds like there is no objection to this change? Can we land these 
changes now?

>> On 06 Nov 2015, at 19:10, Benjamin Mahler <benjamin.mah...@gmail.com> wrote:
>> 
>> To answer your questions:
>> 
>> We use pipes when we need to communicate across the process boundary after
>> a fork. Look for Subprocess::IO::Pipe for examples. There is plenty of code
>> using pipes.
>> 
>> Sockets aren't an issue as one can avoid SIGPIPE across OS X (SO_NOSIGPIPE)
>> and Linux (MSG_NOSIGNAL).
>> 
>> I'm a bit confused by your comment about the timing of SIGPIPE, which seems
>> to suggest that the raising of SIGPIPE is not tied to the bad write call.
>> Why do you think this?
>> 
>> On Fri, Nov 6, 2015 at 4:37 AM, Alexander Rojas <alexan...@mesosphere.io>
>> wrote:
>> 
>>> I have multiple questions here
>>> 
>>> 1. Why do we use pipes at all? or is SIGPIPE raised also when writing into
>>> sockets? which leads me to:
>>> 2. Do we use it only in test cases or is there something actively using
>>> pipes?
>>> 
>>> SIGPIPE itself is a weird signal, since a failed call to `write` returns
>>> -1 and sets `errno` to `EPIPE` so there are two ways to deal with errors
>>> when the reading process is not longer reading, one is handling the return
>>> value+errno (which usually means ignoring the SIGPIPE) and the second is
>>> ignoring the return value and handling SIGPIPE. The difference is that
>>> SIGPIPE is raised as soon as the OS realizes the pipe is broken while the
>>> error on the write happens when you actually try to write on the pipe.
>>> 
>>> All in all, I prefer to ignore the signal and deal with the return value
>>> of `write`.
>>> 
>>>> On 06 Nov 2015, at 03:27, Benjamin Mahler <benjamin.mah...@gmail.com>
>>> wrote:
>>>> 
>>>> Just want to surface this up to the dev@ thread to raise some awareness.
>>>> Recently with the SIGPIPE bug from libev [1], we've revisited whether it
>>>> makes sense to continue down the path of leaving SIGPIPE unblocked and
>>>> trying to handle it case by case.
>>>> 
>>>> We originally wanted users of libprocess to decide on their own whether
>>>> they want to ignore SIGPIPE. However, we'd like to reconsider:
>>>> 
>>>> (a) The amount of code that is needed to work around SIGPIPE is
>>>> substantial, especially because on OS X SIGPIPE appears to not be
>>> delivered
>>>> synchronously [2]. Also, it is not possible to create pipes that don't
>>>> surface SIGPIPE (unlike sockets), so in order to safely write to a pipe
>>> we
>>>> need to wrap write() calls with signal suppression blocks (which we don't
>>>> do in general!). You can get a sense of the code from [3] and [4].
>>>> 
>>>> (b) SIGPIPE seems to be more of a legacy mechanism to shut down a set of
>>>> piped programs and the general recommendation seems to be to not bother
>>>> with it and ignore it. Programs can handle EPIPE as they would with other
>>>> signals.
>>>> 
>>>> Would love to hear if there are any concerns. I will be glad to shepherd
>>>> James' changes here.
>>>> 
>>>> [1] https://issues.apache.org/jira/browse/MESOS-2768
>>>> [2] https://issues.apache.org/jira/browse/MESOS-2079
>>>> [3] https://reviews.apache.org/r/39940/diff/1#index_header
>>>> [4]
>>>> 
>>> https://github.com/apache/mesos/blob/0.25.0/3rdparty/libprocess/3rdparty/stout/include/stout/os/posix/signals.hpp#L101
>>>> 
>>>> On Wed, Nov 4, 2015 at 9:20 AM, James Peach (JIRA) <j...@apache.org>
>>> wrote:
>>>> 
>>>>> 
>>>>>  [
>>>>> 
>>> https://issues.apache.org/jira/browse/MESOS-2079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14989947#comment-14989947
>>>>> ]
>>>>> 
>>>>> James Peach edited comment on MESOS-2079 at 11/4/15 5:19 PM:
>>>>> -------------------------------------------------------------
>>>>> 
>>>>> These patches global ignore {{SIGPIPE}} during libprocess
>>> initialization,
>>>>> document {{SIGPIPE}} behavior a bit more, and remove various signal
>>>>> manipulations that were formerly necessary for disabling {{SIGPIPE}}
>>>>> delivery.
>>>>> 
>>>>> https://reviews.apache.org/r/39938/
>>>>> https://reviews.apache.org/r/39940/
>>>>> https://reviews.apache.org/r/39941/
>>>>> 
>>>>> 
>>>>> 
>>>>> was (Author: jamespeach):
>>>>> https://reviews.apache.org/r/39938/
>>>>> https://reviews.apache.org/r/39940/
>>>>> https://reviews.apache.org/r/39941/
>>>>> 
>>>>> 
>>>>>> IO.Write test is flaky on OS X 10.10.
>>>>>> -------------------------------------
>>>>>> 
>>>>>>              Key: MESOS-2079
>>>>>>              URL: https://issues.apache.org/jira/browse/MESOS-2079
>>>>>>          Project: Mesos
>>>>>>       Issue Type: Task
>>>>>>       Components: libprocess, technical debt, test
>>>>>>      Environment: OS X 10.10
>>>>>> {noformat}
>>>>>> $ clang++ --version
>>>>>> Apple LLVM version 6.0 (clang-600.0.54) (based on LLVM 3.5svn)
>>>>>> Target: x86_64-apple-darwin14.0.0
>>>>>> Thread model: posix
>>>>>> {noformat}
>>>>>>         Reporter: Benjamin Mahler
>>>>>>         Assignee: James Peach
>>>>>>           Labels: flaky
>>>>>> 
>>>>>> [~benjaminhindman]: If I recall correctly, this is related to
>>>>> MESOS-1658. Unfortunately, we don't have a stacktrace for SIGPIPE
>>> currently:
>>>>>> {noformat}
>>>>>> [ RUN      ] IO.Write
>>>>>> make[5]: *** [check-local] Broken pipe: 13
>>>>>> {noformat}
>>>>>> Running in gdb, seems to always occur here:
>>>>>> {code}
>>>>>> Program received signal SIGPIPE, Broken pipe.
>>>>>> [Switching to process 56827 thread 0x60b]
>>>>>> 0x00007fff9a011132 in __psynch_cvwait ()
>>>>>> (gdb) where
>>>>>> #0  0x00007fff9a011132 in __psynch_cvwait ()
>>>>>> #1  0x00007fff903e7ea0 in _pthread_cond_wait ()
>>>>>> #2  0x000000010062f27c in Gate::arrive (this=0x101908a10, old=14780) at
>>>>> gate.hpp:82
>>>>>> #3  0x0000000100600888 in process::schedule (arg=0x0) at
>>>>> src/process.cpp:1373
>>>>>> #4  0x00007fff903e72fc in _pthread_body ()
>>>>>> #5  0x00007fff903e7279 in _pthread_start ()
>>>>>> #6  0x00007fff903e54b1 in thread_start ()
>>>>>> {code}
>>>>> 
>>>>> 
>>>>> 
>>>>> --
>>>>> This message was sent by Atlassian JIRA
>>>>> (v6.3.4#6332)
>>>>> 
>>> 
>>> 
> 

Reply via email to