Hi William,

On Tue, Apr 09, 2019 at 01:54:03PM +0000, William Dauchy wrote:
> Hello,
> 
> Probably a useless report as I don't have a lot information to provide,
> but we faced an issue where the unix socket was unresponsive, with the
> processes using all cpu (1600% with 16 nbthreads)
> 
> I only have the backtrace of the main process but lost the backtrace of
> all threads (nbthread 16). I was also unable to get a response from the
> socket.

Did you issue one of the commands that tries to be alone, thus "show sess"
or "show fd" ? It's possible that you were having only one thread blocked
initially and that with the command that was waiting for all threads to
stop, at some point they all wake up to wait and all eat your CPU.

Thus the real issue to figure is why one thread was blocked.

> (gdb) bt
> #0  0x00005636716d7fbe in fwrr_set_server_status_up (srv=0x5636928e6700) at 
> src/lb_fwrr.c:112

This is a spinlock so apparently it is one of the culprits. The problem
is that I see no other place where this lock is taken and not restored
in this code part. So it might have happened in another function. If
you ever manage so see this again and obtain a core, I'm interested in
seeing all threads' backtraces. In the mean time I'm carefully rechecking
all locked functions.

Thanks,
Willy

Reply via email to