stujin wrote:
>  I work on a high-traffic site that uses apache/mod_perl, and we're
>  seeing some occasional segmentation faults and bus errors in our
>  apache error logs.  These errors sometimes result in the entire apache
>  process group going down, though it seems to me that the problems
>  originate within one of apache's child processes (maybe shared memory
>  is getting corrupted somehow?).
> 
>  I've searched through the archive of this list for similar situations,
>  and I found a lot of questions about seg faults, but none quite
>  matching our problem.
> 
>  We installed some signal handlers in our perl code that trap SIGSEGV
>  and SIGBUS and then dump a perl stack trace to a log file (see below).
>  Using this stack information, we can track the point of failure to a
>  call to perl's "fork()" inside the IPC::Open3 standard module.  Since
>  it seems very unlikely that fork() is broken, we're speculating that
>  there's some funny business going on prior to the fork that's putting
>  the process into an unstable state which prevents it from forking
>  successfully.

I think your analysis is correct.  Unfortunately, I've seen things like
this happen before under heavy loads and never truly determined the
cause of the problem(s).  Maybe some XS code, maybe a problem in Perl...

What I suggest you do is try to build a test case using LWP or ab or
whatever that can cause a segfault within a few tries.  Then slowly
remove parts of your application until they stop happening.  When you
find the problem code, try doing it a different way.

The last time we found one of these it seemed to be related to a couple
of cleanup handlers.  I rewrote some things to use $r->pnotes and avoid
the cleanup handlers and the segfaults went away.  Yes, pretty much
voodoo.  If you have some good C hackers who can sink their teeth into
your gdb trace, you might find the actual source of the problem, but
these things seem to be very elusive and may depend on timing of various
interactions.

- Perrin

Reply via email to