On Thu, 2011-03-03 at 13:03 -0700, Jim Schutt wrote:
> > If none of that works, it's possible that someone is calling exit()
> > somewhere. You can attach a gdb to the process and put a breakpoint on
> > exit() to see if this is going on. There's a lot of "your foo is not
> > bar enough, I hate your config, exit(1)" type code that gets executed
> > while the daemon is starting up. It sounds like you should be past
> > that point, though.
>
> I've finally gotten a little info, using a variant of
> your gdb idea: I waited until many of the OSD instances
> had died, then I attached gdb to several that were left,
> and waited.
>
> Two of them died the same way, like this:
>
> Program received signal SIGPIPE, Broken pipe.
> [Switching to Thread 0x7fd7888c8940 (LWP 28693)]
> 0x00007fd7a9b82f2b in sendmsg () from /lib64/libpthread.so.0
> (gdb) bt
> #0 0x00007fd7a9b82f2b in sendmsg () from /lib64/libpthread.so.0
> #1 0x0000000000672e0b in SimpleMessenger::Pipe::do_sendmsg (
> this=0x7fd799b67c20, sd=13, msg=0x7fd7888c7f20, len=251237, more=false)
> at msg/SimpleMessenger.cc:1994
> #2 0x00000000006739d3 in SimpleMessenger::Pipe::write_message (
> this=0x7fd799b67c20, m=0x7fd79b2dcb70) at msg/SimpleMessenger.cc:2217
> #3 0x000000000067e74a in SimpleMessenger::Pipe::writer (this=0x7fd799b67c20)
> at msg/SimpleMessenger.cc:1734
> #4 0x000000000066fa2b in SimpleMessenger::Pipe::Writer::entry (
> this=0x7fd799b67e70) at msg/SimpleMessenger.h:204
> #5 0x000000000068282e in Thread::_entry_func (arg=0x7fd799b67e70)
> at ./common/Thread.h:41
> #6 0x00007fd7a9b7b73d in start_thread (arg=<value optimized out>)
> at pthread_create.c:301
> #7 0x00007fd7a8a91f6d in clone () from /lib64/libc.so.6
> (gdb)
>
Has something maybe changed in signal handling recently?
Maybe SIGPIPE used to be blocked, and sendmsg() would
return -EPIPE, but now it's not blocked and not handled?
This bit in linux-2.6.git/net/core/stream.c is what made
me wonder, but maybe it's a red herring:
int sk_stream_error(struct sock *sk, int flags, int err)
{
if (err == -EPIPE)
err = sock_error(sk) ? : -EPIPE;
if (err == -EPIPE && !(flags & MSG_NOSIGNAL))
send_sig(SIGPIPE, current, 0);
return err;
}
-- Jim
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to [email protected]
More majordomo info at http://vger.kernel.org/majordomo-info.html