I have found that Apache 2.4.7 with the 'event' MPM sometimes crashes while 
running our test suite. I discussed with Philip and Ben yesterday and they let 
me know that this a known issue with several 2.4.x versions, and is fixed in 
later versions (>= 2.4.10 ?). The Subversion I'm currently testing with is 
trunk@1646184.

The problem I want to highlight here is that our test suite doesn't report 
these crashes. On some runs it reports that some individual tests failed, and 
overall failure, but on other runs it reports no individual tests failed, and 
overall success, even though Apache crashed.

Do we want to make our test suite detect these crashes, and report overall 
failure even if each of the individual test scenarios completed successfully?

It seems to me we should. What do you think?


DETAILS

After a run of externals_tests.py --parallel that reported success on all 
tests, the error_log first contains this:

[mpm_event:error] ... AH00484: server reached MaxRequestWorkers setting, 
consider raising the MaxRequestWorkers setting

which doesn't seem to be a problem in itself, as I always get this even if it 
doesn't go on to crash, and also get a similar message when using 'worker' MPM 
and that doesn't go on to crash either.

Then it may contain one or more messages like this:

[core:notice] ... AH00051: child pid 23555 exit signal Segmentation fault 
(11)...

At the moment I don't know if any error indication is being returned to some 
part of the test runner and then not reported as test failure, or if searching 
the error_log is the only way to detect these crashes.

Here is how I run the tests:

$ (cd obj-dir/ && rm -rf subversion/tests/cmdline/httpd-20141217-* && 
APACHE_MPM=event $SVN_SRC/subversion/tests/cmdline/davautocheck.sh externals 
--cleanup --parallel; echo "Test suite returned: $?"; grep -i "Segmentation 
fault\|mpm_.*error" subversion/tests/cmdline/httpd-20141217-*/error_log)

Here is the entire error_log for one run, after stripping out 'authz_' messages:

[[[
[Wed Dec 17 12:11:39.605935 2014] [mpm_event:debug] [pid 9284:tid 
140516876347136] event.c(1949): AH02471: start_threads: Using epoll
[Wed Dec 17 12:11:39.606003 2014] [mpm_event:notice] [pid 9283:tid 
140516954560384] AH00489: Apache/2.4.7 (Ubuntu) SVN/1.9.0-dev configured -- 
resuming normal operations
[Wed Dec 17 12:11:39.606057 2014] [mpm_event:info] [pid 9283:tid 
140516954560384] AH00490: Server built: Jul 22 2014 14:36:38
[Wed Dec 17 12:11:39.606065 2014] [core:notice] [pid 9283:tid 140516954560384] 
AH00094: Command line: '/usr/sbin/apache2 -f 
/.../subversion/tests/cmdline/httpd-20141217-121139/cfg'
[Wed Dec 17 12:11:39.606080 2014] [mpm_event:debug] [pid 9285:tid 
140516876347136] event.c(1949): AH02471: start_threads: Using epoll
[Wed Dec 17 12:11:39.606877 2014] [mpm_event:debug] [pid 9287:tid 
140516876347136] event.c(1949): AH02471: start_threads: Using epoll
[Wed Dec 17 12:11:39.607031 2014] [mpm_event:debug] [pid 9289:tid 
140516876347136] event.c(1949): AH02471: start_threads: Using epoll
[Wed Dec 17 12:11:40.607116 2014] [mpm_event:error] [pid 9283:tid 
140516954560384] AH00484: server reached MaxRequestWorkers setting, consider 
raising the MaxRequestWorkers setting
[Wed Dec 17 12:11:43.610414 2014] [core:notice] [pid 9283:tid 140516954560384] 
AH00051: child pid 9287 exit signal Segmentation fault (11), possible coredump 
in /.../subversion/tests/cmdline/httpd-20141217-121139
[Wed Dec 17 12:11:44.611355 2014] [core:notice] [pid 9283:tid 140516954560384] 
AH00051: child pid 9289 exit signal Segmentation fault (11), possible coredump 
in /.../subversion/tests/cmdline/httpd-20141217-121139
[Wed Dec 17 12:11:44.611517 2014] [mpm_event:debug] [pid 13516:tid 
140516876347136] event.c(1949): AH02471: start_threads: Using epoll
[Wed Dec 17 12:11:45.614336 2014] [mpm_event:debug] [pid 15082:tid 
140516876347136] event.c(1949): AH02471: start_threads: Using epoll
]]]

Occasionally the operating system's crash report dialogue has popped up on one 
of these runs, but in most runs it doesn't. I don't know why.

FWIW, when I enabled core dumping (ulimit -c 100000) and loaded a core dump 
into GDB I saw:

(gdb) bt
 #0  process_socket (my_thread_num=0, ...) at event.c:1064
 #1  worker_thread (thd=<optimised out>, dummy=<optimised out>) at event.c:1815
 #2  0x00007fb76889f182 in start_thread (arg=0x7fb75ffff700) at 
pthread_create.c:312
 #3  0x00007fb7685cbfbd in clone () at 
../sysdeps/unix/sysv/linux/x86_64/clone.S:111

That was not the same test run that I reported above, and I haven't checked if 
the back-trace is similar every time.

- Julian

Reply via email to