Re: Chasing a segfault

Sorin Manolache Sat, 23 Oct 2021 14:01:22 -0700

On 23/10/2021 02.49, miim wrote:

I have a relatively simple module which is nonetheless causing Apache to 
intermittently segfault.


I've added debugging trace messages to be sent to the error log, but the lack of anything 
in the log at the time of the segfault leads me to think that the error log is not 
flushed when a message is sent.  For example, a segfault occurs at 00:18:04, last 
previous request was at 00:15:36, so clearly the new request caused the segfault.   But 
not even the "Here I am at the handler entry point" (see below) gets into the 
logfile before the server log reports a segfault taking down Apache.


   /* Retrieve the per-server configuration */
   mod_bc_config *bc_scfg = ap_get_module_config(r->server->module_config,
                                           &bridcheck_module);
   if (bc_scfg->bc_logdebug & 0x0020000000000)
      ap_log_rerror(APLOG_MARK, APLOG_NOTICE, 0, r,
                    "mod_bridcheck: Enter bridcheck_handler");


I could turn on core dumping but (a) I am no expert at decoding core dumps and 
(b) I don't want to dump this problem on somebody else.

So ... is there a way to force Apache to flush the error log before proceeding?


Hello,

I think it is not a problem of log flushing. It is just that when asegfault occurs the death is sudden because the process is killed by theOS and has few chances to handle the error itself.

I am very confident, almost 100% sure, that if you don't see the messagein the log then the execution has simply not reached it, the segfaulthappened before.

In my opinion it is easier to learn some four or five gdb commands thanto do whatsoever when the segfault occurs. There's only one way ofpreventing the death of the process and that it to place a handler onthe SIGSEGV signal in your module (see "man signal" or "man sigaction").But there's not much you can do in the signal handler. As said, it ismuch much easier to activate coredumps and learn some commands.


Here's how I do it typically:

In Debian/Ubuntu distributions, they put a file named envvars in/etc/apache2. If you have such a distribution edit it as I show below.If not, then make sure you get the same effects with other means.


I put the following two lines:

ulimit -c unlimited
echo 1 > /proc/sys/kernel/core_uses_pid

The first line is an internal shell command saying that there should beno size limit on the core file. If you don't have /etc/apache2/envvarsthen this command should be executed in the shell from which you launchapache, such that the apache process inherits this configuration.

The second command instructs the kernel to add the process id to thename of the core file. Thus, if you have two apache children that dumpcores at the same time, you'll get two different core files instead ofsingle file in which the kernel writes both cores, and makes it thusunusable. If you don't have /etc/apache2/envvars then you can executethis command in any shell, just that you need root privileges in orderto write to /proc/sys/kernel/core_uses_pid.

Let us assume you have now the core file and its name is core.12345,where 12345 is the process id of the apache child process that died.


Then I start gdb and I execute the following gdb commands at the gdb prompt:

file /usr/sbin/apache2
core-file core.12345
thread apply all bt

The first command loads the apache executable.
The second command loads the core file.

The thirst command displays the call stacks of all threads of theprocess (bt = backtrace).


You can switch between threads with the command
thread N

where N is the numerical id of the thread you want to switch to.

Once you're in a thread, you can move up and down the call stack withthe commands "up" and "down". If you compiled your module with debugsymbols then you can inspect variables with the "print" command, e.g."print bc_scfg". If, for example, the segfault occurred somewhere in alibc function, such as malloc, free, strcpy, etc, you may move up thecall chain to the caller of the libc function, to inspect its arguments.

Besides the necessary "-g" compiler switch for adding debugging symbols,I typically add the "-fno-inline -O0" switches. This prevents any codeoptimisation. When I execute step-by-step in a debugger (a live program,obviously, not a core-file) the instruction are really executed in theorder written in the program and not rearranged for speed.

You may also debug a live program. "Normal" programs, when debugging,are typically launched directly in the debugger. This is not reallyadvisable in apache, because it forks. What I do is to let apache startnormally ("apache2ctl start" or "systemctl start apache2") and thenattach the debugger to a live apache child process. I launch gdb, then Iexecute the following commands at the gdb prompt:


attach N (where N is the process id of the apache child)
break my_handler (set a breakpoint at one of my functions)

cont (let the process continue its execution until it reaches thebreakpoint and I get the command prompt back)

When the breakpoint is reached I can inspect variables ("printvariable") and execute step by step ("step" and "next").


HTH,
Sorin

Re: Chasing a segfault

Reply via email to