> On Nov 11, 2015, at 12:44 AM, Alexander Rojas <alexan...@mesosphere.io> wrote: > > What I meant is that we may not care about SIGPIPE (which tell us a pipe was > broken) because we will be notified when we try to write into it anyway (on > the writing side) and we will get an EOF on the reading side. > > The only thing I could see us caring about SIGPIPE is if we want to know as > soon as the pipe breaks that the event happened.
So it sounds like there is no objection to this change? Can we land these changes now? >> On 06 Nov 2015, at 19:10, Benjamin Mahler <benjamin.mah...@gmail.com> wrote: >> >> To answer your questions: >> >> We use pipes when we need to communicate across the process boundary after >> a fork. Look for Subprocess::IO::Pipe for examples. There is plenty of code >> using pipes. >> >> Sockets aren't an issue as one can avoid SIGPIPE across OS X (SO_NOSIGPIPE) >> and Linux (MSG_NOSIGNAL). >> >> I'm a bit confused by your comment about the timing of SIGPIPE, which seems >> to suggest that the raising of SIGPIPE is not tied to the bad write call. >> Why do you think this? >> >> On Fri, Nov 6, 2015 at 4:37 AM, Alexander Rojas <alexan...@mesosphere.io> >> wrote: >> >>> I have multiple questions here >>> >>> 1. Why do we use pipes at all? or is SIGPIPE raised also when writing into >>> sockets? which leads me to: >>> 2. Do we use it only in test cases or is there something actively using >>> pipes? >>> >>> SIGPIPE itself is a weird signal, since a failed call to `write` returns >>> -1 and sets `errno` to `EPIPE` so there are two ways to deal with errors >>> when the reading process is not longer reading, one is handling the return >>> value+errno (which usually means ignoring the SIGPIPE) and the second is >>> ignoring the return value and handling SIGPIPE. The difference is that >>> SIGPIPE is raised as soon as the OS realizes the pipe is broken while the >>> error on the write happens when you actually try to write on the pipe. >>> >>> All in all, I prefer to ignore the signal and deal with the return value >>> of `write`. >>> >>>> On 06 Nov 2015, at 03:27, Benjamin Mahler <benjamin.mah...@gmail.com> >>> wrote: >>>> >>>> Just want to surface this up to the dev@ thread to raise some awareness. >>>> Recently with the SIGPIPE bug from libev [1], we've revisited whether it >>>> makes sense to continue down the path of leaving SIGPIPE unblocked and >>>> trying to handle it case by case. >>>> >>>> We originally wanted users of libprocess to decide on their own whether >>>> they want to ignore SIGPIPE. However, we'd like to reconsider: >>>> >>>> (a) The amount of code that is needed to work around SIGPIPE is >>>> substantial, especially because on OS X SIGPIPE appears to not be >>> delivered >>>> synchronously [2]. Also, it is not possible to create pipes that don't >>>> surface SIGPIPE (unlike sockets), so in order to safely write to a pipe >>> we >>>> need to wrap write() calls with signal suppression blocks (which we don't >>>> do in general!). You can get a sense of the code from [3] and [4]. >>>> >>>> (b) SIGPIPE seems to be more of a legacy mechanism to shut down a set of >>>> piped programs and the general recommendation seems to be to not bother >>>> with it and ignore it. Programs can handle EPIPE as they would with other >>>> signals. >>>> >>>> Would love to hear if there are any concerns. I will be glad to shepherd >>>> James' changes here. >>>> >>>> [1] https://issues.apache.org/jira/browse/MESOS-2768 >>>> [2] https://issues.apache.org/jira/browse/MESOS-2079 >>>> [3] https://reviews.apache.org/r/39940/diff/1#index_header >>>> [4] >>>> >>> https://github.com/apache/mesos/blob/0.25.0/3rdparty/libprocess/3rdparty/stout/include/stout/os/posix/signals.hpp#L101 >>>> >>>> On Wed, Nov 4, 2015 at 9:20 AM, James Peach (JIRA) <j...@apache.org> >>> wrote: >>>> >>>>> >>>>> [ >>>>> >>> https://issues.apache.org/jira/browse/MESOS-2079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14989947#comment-14989947 >>>>> ] >>>>> >>>>> James Peach edited comment on MESOS-2079 at 11/4/15 5:19 PM: >>>>> ------------------------------------------------------------- >>>>> >>>>> These patches global ignore {{SIGPIPE}} during libprocess >>> initialization, >>>>> document {{SIGPIPE}} behavior a bit more, and remove various signal >>>>> manipulations that were formerly necessary for disabling {{SIGPIPE}} >>>>> delivery. >>>>> >>>>> https://reviews.apache.org/r/39938/ >>>>> https://reviews.apache.org/r/39940/ >>>>> https://reviews.apache.org/r/39941/ >>>>> >>>>> >>>>> >>>>> was (Author: jamespeach): >>>>> https://reviews.apache.org/r/39938/ >>>>> https://reviews.apache.org/r/39940/ >>>>> https://reviews.apache.org/r/39941/ >>>>> >>>>> >>>>>> IO.Write test is flaky on OS X 10.10. >>>>>> ------------------------------------- >>>>>> >>>>>> Key: MESOS-2079 >>>>>> URL: https://issues.apache.org/jira/browse/MESOS-2079 >>>>>> Project: Mesos >>>>>> Issue Type: Task >>>>>> Components: libprocess, technical debt, test >>>>>> Environment: OS X 10.10 >>>>>> {noformat} >>>>>> $ clang++ --version >>>>>> Apple LLVM version 6.0 (clang-600.0.54) (based on LLVM 3.5svn) >>>>>> Target: x86_64-apple-darwin14.0.0 >>>>>> Thread model: posix >>>>>> {noformat} >>>>>> Reporter: Benjamin Mahler >>>>>> Assignee: James Peach >>>>>> Labels: flaky >>>>>> >>>>>> [~benjaminhindman]: If I recall correctly, this is related to >>>>> MESOS-1658. Unfortunately, we don't have a stacktrace for SIGPIPE >>> currently: >>>>>> {noformat} >>>>>> [ RUN ] IO.Write >>>>>> make[5]: *** [check-local] Broken pipe: 13 >>>>>> {noformat} >>>>>> Running in gdb, seems to always occur here: >>>>>> {code} >>>>>> Program received signal SIGPIPE, Broken pipe. >>>>>> [Switching to process 56827 thread 0x60b] >>>>>> 0x00007fff9a011132 in __psynch_cvwait () >>>>>> (gdb) where >>>>>> #0 0x00007fff9a011132 in __psynch_cvwait () >>>>>> #1 0x00007fff903e7ea0 in _pthread_cond_wait () >>>>>> #2 0x000000010062f27c in Gate::arrive (this=0x101908a10, old=14780) at >>>>> gate.hpp:82 >>>>>> #3 0x0000000100600888 in process::schedule (arg=0x0) at >>>>> src/process.cpp:1373 >>>>>> #4 0x00007fff903e72fc in _pthread_body () >>>>>> #5 0x00007fff903e7279 in _pthread_start () >>>>>> #6 0x00007fff903e54b1 in thread_start () >>>>>> {code} >>>>> >>>>> >>>>> >>>>> -- >>>>> This message was sent by Atlassian JIRA >>>>> (v6.3.4#6332) >>>>> >>> >>> >