Yeah I had chatted with Alexander in person to clarify what the actual semantics of SIGPIPE are. We should be good to go here, sorry for the delay I will get back to these patches.
On Fri, Nov 20, 2015 at 8:52 AM, James Peach <jor...@gmail.com> wrote: > > > On Nov 11, 2015, at 12:44 AM, Alexander Rojas <alexan...@mesosphere.io> > wrote: > > > > What I meant is that we may not care about SIGPIPE (which tell us a pipe > was broken) because we will be notified when we try to write into it anyway > (on the writing side) and we will get an EOF on the reading side. > > > > The only thing I could see us caring about SIGPIPE is if we want to know > as soon as the pipe breaks that the event happened. > > So it sounds like there is no objection to this change? Can we land these > changes now? > > >> On 06 Nov 2015, at 19:10, Benjamin Mahler <benjamin.mah...@gmail.com> > wrote: > >> > >> To answer your questions: > >> > >> We use pipes when we need to communicate across the process boundary > after > >> a fork. Look for Subprocess::IO::Pipe for examples. There is plenty of > code > >> using pipes. > >> > >> Sockets aren't an issue as one can avoid SIGPIPE across OS X > (SO_NOSIGPIPE) > >> and Linux (MSG_NOSIGNAL). > >> > >> I'm a bit confused by your comment about the timing of SIGPIPE, which > seems > >> to suggest that the raising of SIGPIPE is not tied to the bad write > call. > >> Why do you think this? > >> > >> On Fri, Nov 6, 2015 at 4:37 AM, Alexander Rojas < > alexan...@mesosphere.io> > >> wrote: > >> > >>> I have multiple questions here > >>> > >>> 1. Why do we use pipes at all? or is SIGPIPE raised also when writing > into > >>> sockets? which leads me to: > >>> 2. Do we use it only in test cases or is there something actively using > >>> pipes? > >>> > >>> SIGPIPE itself is a weird signal, since a failed call to `write` > returns > >>> -1 and sets `errno` to `EPIPE` so there are two ways to deal with > errors > >>> when the reading process is not longer reading, one is handling the > return > >>> value+errno (which usually means ignoring the SIGPIPE) and the second > is > >>> ignoring the return value and handling SIGPIPE. The difference is that > >>> SIGPIPE is raised as soon as the OS realizes the pipe is broken while > the > >>> error on the write happens when you actually try to write on the pipe. > >>> > >>> All in all, I prefer to ignore the signal and deal with the return > value > >>> of `write`. > >>> > >>>> On 06 Nov 2015, at 03:27, Benjamin Mahler <benjamin.mah...@gmail.com> > >>> wrote: > >>>> > >>>> Just want to surface this up to the dev@ thread to raise some > awareness. > >>>> Recently with the SIGPIPE bug from libev [1], we've revisited whether > it > >>>> makes sense to continue down the path of leaving SIGPIPE unblocked and > >>>> trying to handle it case by case. > >>>> > >>>> We originally wanted users of libprocess to decide on their own > whether > >>>> they want to ignore SIGPIPE. However, we'd like to reconsider: > >>>> > >>>> (a) The amount of code that is needed to work around SIGPIPE is > >>>> substantial, especially because on OS X SIGPIPE appears to not be > >>> delivered > >>>> synchronously [2]. Also, it is not possible to create pipes that don't > >>>> surface SIGPIPE (unlike sockets), so in order to safely write to a > pipe > >>> we > >>>> need to wrap write() calls with signal suppression blocks (which we > don't > >>>> do in general!). You can get a sense of the code from [3] and [4]. > >>>> > >>>> (b) SIGPIPE seems to be more of a legacy mechanism to shut down a set > of > >>>> piped programs and the general recommendation seems to be to not > bother > >>>> with it and ignore it. Programs can handle EPIPE as they would with > other > >>>> signals. > >>>> > >>>> Would love to hear if there are any concerns. I will be glad to > shepherd > >>>> James' changes here. > >>>> > >>>> [1] https://issues.apache.org/jira/browse/MESOS-2768 > >>>> [2] https://issues.apache.org/jira/browse/MESOS-2079 > >>>> [3] https://reviews.apache.org/r/39940/diff/1#index_header > >>>> [4] > >>>> > >>> > https://github.com/apache/mesos/blob/0.25.0/3rdparty/libprocess/3rdparty/stout/include/stout/os/posix/signals.hpp#L101 > >>>> > >>>> On Wed, Nov 4, 2015 at 9:20 AM, James Peach (JIRA) <j...@apache.org> > >>> wrote: > >>>> > >>>>> > >>>>> [ > >>>>> > >>> > https://issues.apache.org/jira/browse/MESOS-2079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14989947#comment-14989947 > >>>>> ] > >>>>> > >>>>> James Peach edited comment on MESOS-2079 at 11/4/15 5:19 PM: > >>>>> ------------------------------------------------------------- > >>>>> > >>>>> These patches global ignore {{SIGPIPE}} during libprocess > >>> initialization, > >>>>> document {{SIGPIPE}} behavior a bit more, and remove various signal > >>>>> manipulations that were formerly necessary for disabling {{SIGPIPE}} > >>>>> delivery. > >>>>> > >>>>> https://reviews.apache.org/r/39938/ > >>>>> https://reviews.apache.org/r/39940/ > >>>>> https://reviews.apache.org/r/39941/ > >>>>> > >>>>> > >>>>> > >>>>> was (Author: jamespeach): > >>>>> https://reviews.apache.org/r/39938/ > >>>>> https://reviews.apache.org/r/39940/ > >>>>> https://reviews.apache.org/r/39941/ > >>>>> > >>>>> > >>>>>> IO.Write test is flaky on OS X 10.10. > >>>>>> ------------------------------------- > >>>>>> > >>>>>> Key: MESOS-2079 > >>>>>> URL: https://issues.apache.org/jira/browse/MESOS-2079 > >>>>>> Project: Mesos > >>>>>> Issue Type: Task > >>>>>> Components: libprocess, technical debt, test > >>>>>> Environment: OS X 10.10 > >>>>>> {noformat} > >>>>>> $ clang++ --version > >>>>>> Apple LLVM version 6.0 (clang-600.0.54) (based on LLVM 3.5svn) > >>>>>> Target: x86_64-apple-darwin14.0.0 > >>>>>> Thread model: posix > >>>>>> {noformat} > >>>>>> Reporter: Benjamin Mahler > >>>>>> Assignee: James Peach > >>>>>> Labels: flaky > >>>>>> > >>>>>> [~benjaminhindman]: If I recall correctly, this is related to > >>>>> MESOS-1658. Unfortunately, we don't have a stacktrace for SIGPIPE > >>> currently: > >>>>>> {noformat} > >>>>>> [ RUN ] IO.Write > >>>>>> make[5]: *** [check-local] Broken pipe: 13 > >>>>>> {noformat} > >>>>>> Running in gdb, seems to always occur here: > >>>>>> {code} > >>>>>> Program received signal SIGPIPE, Broken pipe. > >>>>>> [Switching to process 56827 thread 0x60b] > >>>>>> 0x00007fff9a011132 in __psynch_cvwait () > >>>>>> (gdb) where > >>>>>> #0 0x00007fff9a011132 in __psynch_cvwait () > >>>>>> #1 0x00007fff903e7ea0 in _pthread_cond_wait () > >>>>>> #2 0x000000010062f27c in Gate::arrive (this=0x101908a10, > old=14780) at > >>>>> gate.hpp:82 > >>>>>> #3 0x0000000100600888 in process::schedule (arg=0x0) at > >>>>> src/process.cpp:1373 > >>>>>> #4 0x00007fff903e72fc in _pthread_body () > >>>>>> #5 0x00007fff903e7279 in _pthread_start () > >>>>>> #6 0x00007fff903e54b1 in thread_start () > >>>>>> {code} > >>>>> > >>>>> > >>>>> > >>>>> -- > >>>>> This message was sent by Atlassian JIRA > >>>>> (v6.3.4#6332) > >>>>> > >>> > >>> > > > >