On Wed, 1 Feb 2017, Eric Covener wrote:

On Wed, Feb 1, 2017 at 1:02 PM, Niklas Edmundsson <[email protected]> wrote:
This might be due to processes being cleaned up due to hitting
MaxSpareThreads or MaxConnectionsPerChild, these are tuned to not happen
frequently. It's just a wild guess, but the reason for me suspecting this is
the weird looking stacktraces that points towards use-after-free issues...

The backtraces of the other threads in the process might give a hint
if graceful proc shutdown is occurring -- e.g. one thread might have
join_workers / apr_thread_join in the stack.

Nothing obvious to me (grep -i join finds nothing in the backtraces). I have 9 coredumps with the thread apply all bt full output summing to 1.3 MB which feels a bit much to post here, although I guess they would be small if I bzip:ed and attached them...

Before we look too hard in that direction though, I remembered that our httpd init script sets the stacksize to 512k down from the Linux default of 8MB (historical reasons). Might that be the easy explanation, ie threads overflowing the stack?

We only started serving https this autumn, and recently saw a bump in usage due to the LineageOS mirror. It is entirely possible that this is triggered by a usage pattern change exposing some of our arcane habits ;)

Another observation is that these dumps seems to happen in groups, ie:

[Tue Jan 24 19:32:52.277623 2017] [core:notice] [pid 2520:tid 139983763429184] 
AH00051: child pid 425377 exit signal Bus error (7), possible coredump in /tmp
[Tue Jan 24 19:32:55.281211 2017] [core:notice] [pid 2520:tid 139983763429184] 
AH00051: child pid 29545 exit signal Bus error (7), possible coredump in /tmp
[Tue Jan 24 19:32:56.282240 2017] [core:notice] [pid 2520:tid 139983763429184] 
AH00051: child pid 20749 exit signal Bus error (7), possible coredump in /tmp
[Tue Jan 24 19:32:58.285476 2017] [core:notice] [pid 2520:tid 139983763429184] 
AH00051: child pid 679374 exit signal Bus error (7), possible coredump in /tmp
[Wed Jan 25 00:14:29.743371 2017] [core:notice] [pid 2520:tid 139983763429184] 
AH00051: child pid 679441 exit signal Bus error (7), possible coredump in /tmp
[Wed Jan 25 00:14:38.753792 2017] [core:notice] [pid 2520:tid 139983763429184] 
AH00051: child pid 679547 exit signal Bus error (7), possible coredump in /tmp
[Tue Jan 31 15:29:04.767732 2017] [core:notice] [pid 2520:tid 139983763429184] 
AH00051: child pid 954024 exit signal Bus error (7), possible coredump in /tmp
[Tue Jan 31 15:29:09.773329 2017] [core:notice] [pid 2520:tid 139983763429184] 
AH00051: child pid 693010 exit signal Bus error (7), possible coredump in /tmp
[Tue Jan 31 15:29:18.782301 2017] [core:notice] [pid 2520:tid 139983763429184] 
AH00051: child pid 693899 exit signal Bus error (7), possible coredump in /tmp

Don't know what, if any, conclusions can be drawn from that though...


/Nikke - rambling a bit before falling asleep...
--
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
 Niklas Edmundsson, Admin @ {acc,hpc2n}.umu.se      |     [email protected]
---------------------------------------------------------------------------
 It's a port of call, home away from home...
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=

Reply via email to