[ 
https://issues.apache.org/jira/browse/MESOS-2768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14612762#comment-14612762
 ] 

Benjamin Mahler commented on MESOS-2768:
----------------------------------------

Rather than logging calls to os::close, we'd like the program to crash and 
generate a stack trace if os::close is called against libev's pipe fd's, so 
that we get alerted and can obtain the stack trace.

The bug seems to be on head, logs are already shared above and haven't provided 
any insight yet.. :)

> SIGPIPE in process::run_in_event_loop()
> ---------------------------------------
>
>                 Key: MESOS-2768
>                 URL: https://issues.apache.org/jira/browse/MESOS-2768
>             Project: Mesos
>          Issue Type: Bug
>    Affects Versions: 0.23.0
>            Reporter: Yan Xu
>            Priority: Critical
>
> Observed in production.
> {noformat:title=slave log}
> I0526 12:17:48.027257 51633 slave.cpp:4077] Received a new estimation of the 
> oversubscribable resources 
> W0526 12:17:48.027257 51636 logging.cpp:91] RAW: Received signal SIGPIPE; 
> escalating to SIGABRT
> *** Aborted at 1432642668 (unix time) try "date -d @1432642668" if you are 
> using GNU date ***
> PC: @     0x7fa58c23eb6d raise
> *** SIGABRT (@0xc9a5) received by PID 51621 (TID 0x7fa58224c940) from PID 
> 51621; stack trace: ***
>     @     0x7fa58c23eca0 (unknown)
>     @     0x7fa58c23eb6d raise
>     @     0x7fa58cc19ba7 mesos::internal::logging::handler()
>     @     0x7fa58c23eca0 (unknown)
>     @     0x7fa58c23da2b __libc_write
>     @     0x7fa58cb57b6f evpipe_write.part.5
>     @     0x7fa58d245070 process::run_in_event_loop<>()
>     @     0x7fa58d2441ba process::EventLoop::delay()
>     @     0x7fa58d1c3c9c process::clock::scheduleTick()
>     @     0x7fa58d1c65b1 process::Clock::timer()
>     @     0x7fa58d23915a process::delay<>()
>     @     0x7fa58d23a740 process::ReaperProcess::wait()
>     @     0x7fa58d21261a process::ProcessManager::resume()
>     @     0x7fa58d2128dc process::schedule()
>     @     0x7fa58c23683d start_thread
>     @     0x7fa58ba28fcd clone
> {noformat}
> {noformat:title=gdb}
> (gdb) bt
> #0  0x00007fa58c23eb6d in raise () from /lib64/libpthread.so.0
> #1  0x00007fa58cc19ba7 in mesos::internal::logging::handler (signal=Unhandled 
> dwarf expression opcode 0xf3
> ) at logging/logging.cpp:92
> #2  <signal handler called>
> #3  0x00007fa58c23da2b in write () from /lib64/libpthread.so.0
> #4  0x00007fa58cb57b6f in evpipe_write (loop=0x7fa58e1e79c0, flag=Unhandled 
> dwarf expression opcode 0xfa
> ) at ev.c:2172
> #5  0x00007fa58d245070 in process::run_in_event_loop<Nothing>(const 
> std::function<process::Future<Nothing>()> &) (f=Unhandled dwarf expression 
> opcode 0xf3
> ) at src/libev.hpp:80
> #6  0x00007fa58d2441ba in process::EventLoop::delay(const Duration &, const 
> std::function<void()> &) (duration=Unhandled dwarf expression opcode 0xf3
> ) at src/libev.cpp:106
> #7  0x00007fa58d1c3c9c in process::clock::scheduleTick (timers=Unhandled 
> dwarf expression opcode 0xf3
> ) at src/clock.cpp:119
> #8  0x00007fa58d1c65b1 in process::Clock::timer(const Duration &, const 
> std::function<void()> &) (duration=Unhandled dwarf expression opcode 0xf3
> ) at src/clock.cpp:254
> #9  0x00007fa58d23915a in process::delay<process::ReaperProcess> 
> (duration=..., pid=Unhandled dwarf expression opcode 0xf3
> ) at ./include/process/delay.hpp:25
> #10 0x00007fa58d23a740 in process::ReaperProcess::wait (this=0x2056920) at 
> src/reap.cpp:93
> #11 0x00007fa58d21261a in process::ProcessManager::resume (this=0x1db8d20, 
> process=0x2056958) at src/process.cpp:2172
> #12 0x00007fa58d2128dc in process::schedule (arg=Unhandled dwarf expression 
> opcode 0xf3
> ) at src/process.cpp:602
> #13 0x00007fa58c23683d in start_thread () from /lib64/libpthread.so.0
> #14 0x00007fa58ba28fcd in clone () from /lib64/libc.so.6
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to