On 23/10/2021 02.49, miim wrote:
I have a relatively simple module which is nonetheless causing Apache to 
intermittently segfault.

I've added debugging trace messages to be sent to the error log, but the lack of anything 
in the log at the time of the segfault leads me to think that the error log is not 
flushed when a message is sent.  For example, a segfault occurs at 00:18:04, last 
previous request was at 00:15:36, so clearly the new request caused the segfault.   But 
not even the "Here I am at the handler entry point" (see below) gets into the 
logfile before the server log reports a segfault taking down Apache.


   /* Retrieve the per-server configuration */
   mod_bc_config *bc_scfg = ap_get_module_config(r->server->module_config,
                                           &bridcheck_module);
   if (bc_scfg->bc_logdebug & 0x0020000000000)
      ap_log_rerror(APLOG_MARK, APLOG_NOTICE, 0, r,
                    "mod_bridcheck: Enter bridcheck_handler");


I could turn on core dumping but (a) I am no expert at decoding core dumps and 
(b) I don't want to dump this problem on somebody else.

So ... is there a way to force Apache to flush the error log before proceeding?

Hello,

I think it is not a problem of log flushing. It is just that when a segfault occurs the death is sudden because the process is killed by the OS and has few chances to handle the error itself.

I am very confident, almost 100% sure, that if you don't see the message in the log then the execution has simply not reached it, the segfault happened before.

In my opinion it is easier to learn some four or five gdb commands than to do whatsoever when the segfault occurs. There's only one way of preventing the death of the process and that it to place a handler on the SIGSEGV signal in your module (see "man signal" or "man sigaction"). But there's not much you can do in the signal handler. As said, it is much much easier to activate coredumps and learn some commands.

Here's how I do it typically:

In Debian/Ubuntu distributions, they put a file named envvars in /etc/apache2. If you have such a distribution edit it as I show below. If not, then make sure you get the same effects with other means.

I put the following two lines:

ulimit -c unlimited
echo 1 > /proc/sys/kernel/core_uses_pid

The first line is an internal shell command saying that there should be no size limit on the core file. If you don't have /etc/apache2/envvars then this command should be executed in the shell from which you launch apache, such that the apache process inherits this configuration.

The second command instructs the kernel to add the process id to the name of the core file. Thus, if you have two apache children that dump cores at the same time, you'll get two different core files instead of single file in which the kernel writes both cores, and makes it thus unusable. If you don't have /etc/apache2/envvars then you can execute this command in any shell, just that you need root privileges in order to write to /proc/sys/kernel/core_uses_pid.

Let us assume you have now the core file and its name is core.12345, where 12345 is the process id of the apache child process that died.

Then I start gdb and I execute the following gdb commands at the gdb prompt:

file /usr/sbin/apache2
core-file core.12345
thread apply all bt

The first command loads the apache executable.
The second command loads the core file.
The thirst command displays the call stacks of all threads of the process (bt = backtrace).

You can switch between threads with the command
thread N

where N is the numerical id of the thread you want to switch to.

Once you're in a thread, you can move up and down the call stack with the commands "up" and "down". If you compiled your module with debug symbols then you can inspect variables with the "print" command, e.g. "print bc_scfg". If, for example, the segfault occurred somewhere in a libc function, such as malloc, free, strcpy, etc, you may move up the call chain to the caller of the libc function, to inspect its arguments.

Besides the necessary "-g" compiler switch for adding debugging symbols, I typically add the "-fno-inline -O0" switches. This prevents any code optimisation. When I execute step-by-step in a debugger (a live program, obviously, not a core-file) the instruction are really executed in the order written in the program and not rearranged for speed.

You may also debug a live program. "Normal" programs, when debugging, are typically launched directly in the debugger. This is not really advisable in apache, because it forks. What I do is to let apache start normally ("apache2ctl start" or "systemctl start apache2") and then attach the debugger to a live apache child process. I launch gdb, then I execute the following commands at the gdb prompt:

attach N (where N is the process id of the apache child)
break my_handler (set a breakpoint at one of my functions)
cont (let the process continue its execution until it reaches the breakpoint and I get the command prompt back)

When the breakpoint is reached I can inspect variables ("print variable") and execute step by step ("step" and "next").

HTH,
Sorin

Reply via email to