Rainer,

the crashes seem the same I see when running mod-md tests: a watchdog being 
busy with openssl while the process shuts down. 

Yann wrote a patch a while back which solved it for me. But we were reluctant 
to make that change. Maybe it‘s time to revisit that. 


> Am 30.03.2020 um 01:22 schrieb Rainer Jung <[email protected]>:
> 
> Am 26.03.2020 um 15:50 schrieb Daniel Ruggeri:
>> Hi, all;
>>    Please find below the proposed release tarball and signatures:
>> https://dist.apache.org/repos/dist/dev/httpd/
>> I would like to call a VOTE over the next few days to release this
>> candidate tarball as 2.4.43:
>> [X] +1: It's not just good, it's good enough!
>> [ ] +0: Let's have a talk.
>> [ ] -1: There's trouble in paradise. Here's what's wrong.
>> The computed digests of the tarball up for vote are:
>> sha1: 15d8605b094dfe5e283cd9e90770368dd14e26f2 *httpd-2.4.43.tar.gz
>> sha256: 2624e92d89b20483caeffe514a7c7ba93ab13b650295ae330f01c35d5b50d87f
>> *httpd-2.4.43.tar.gz
>> sha512:
>> d9879b8f8ef7d94dee1024e9c25b56d963a3b072520878a88a044629ad577c109a5456791b39016bf4f6672c04bf4a0e5cfd32381211e9acdc81d4a50b359e5e
>> *httpd-2.4.43.tar.gz
>> The SVN tag is '2.4.43' at r1875715.
> 
> +1 to release and thanks a bunch for RM!
> 
> Summary: all OK except for
> 
> - very few shutdown crashes on Solaris (already observed in 2.4.37, but then 
> with MPM event when statically linked, now once with prefork and shared 
> linking). Happens in mod_watchdog. Maybe prefork doesn't expect another 
> thread running and doing deinit. gdb info at end.
> 
> - another shutdown crash on Solaris in mod_watchdog for prefork. gdb info at 
> end.
> 
> - sporadic hangs with prefork plus mod_ext_filter on Linux (see separate 
> thread).
> 
> Detailed report:
> 
> - Sigs and hashes OK
> - contents of tarballs identical
> - contents of tag and tarballs identical
>  except for expected deltas
> 
> Built on
> 
> - Solaris 10 Sparc as 32 Bit Binaries
> - SLES 11+12+15 (64 Bits)
> - RHEL 6+7 (64 Bits)
> 
> For all platforms built
> 
> - with default (shared) and static modules
> - with module set reallyall
> - using --enable-load-all-modules
> - against external APR/APU 1.7.0/1.6.1
>  plus APR/APU 1.6.5/1.6.1
>  plus APR/APU 1.7.x HEAD/1.7.x HEAD with expat
>  plus APR/APU 1.7.x HEAD/1.7.x HEAD with libxml2
>  plus APR/APU from deps tarball
> 
> - using external libraries
>  - expat 2.2.9
>  - pcre 8.44
>  - lua 5.3.5 (compiled with LUA_COMPAT_MODULE)
>  - libxml2 2.9.10
>  - libnghttp2 1.40.0
>  - brotli 1.0.7
>  - curl 7.69.1
>  - jansson 2.12
> and
>  - openssl 0.9.8zh, 1.0.2, 1.0.2u, 1.0.1e, 1.0.1l, 1.1.1, 1.1.1e plus patches 
> (head of a few days ago)
> 
> - Tool chain:
>    - platform gcc except on Solaris
>      (gcc 9.3.0 Solaris 10)
>    - CFLAGS: -O2 -g -Wall -fno-strict-aliasing
>      - on Solaris additionally -mpcu=v9, -D_XOPEN_SOURCE,
>        -D_XOPEN_SOURCE_EXTENDED=1, -D__EXTENSIONS__
>        and -D_XPG6
> 
> All of the 660 builds succeeded, a few are still ongoing.
> 
> - compiler warnings: only on Solaris (GCC 9.3.0):
> srclib/apr/locks/unix/proc_mutex.c:979:49: warning: 
> 'mutex_proc_pthread_cond_methods' defined but not used 
> [-Wunused-const-variable=]
> 
> Tested for
> 
> - Solaris 10, SLES 11+12+15, RHEL 6+7
> - MPMs prefork, worker, event
> - default and static module builds
> - log level trace8
> - module set reallyall (127 modules)
> - Perl client bundle build against OpenSSL 1.1.1, 1.1.0i, 1.0.2p and 0.9.8zh
> - OpenSSL once linked statically and once as a shared library
> 
> Every OpenSSL version in the client tested with 1.1.1e plus patches in the 
> server. Tests for more server OpenSSL versions are ongoing.
> 
> The total number of test suite runs was 680 (many more to come ...).
> 
> The following test failures were seen:
> 
> a Crashes only on Solaris, only with prefork MPM and
>  dynamically linked builds.
>  The crash seems to happen only at the end of a process during pchild
>  clean up and it might be problematic, that the watchdog thread at that
>  time still exists.
>  gdb info see at end.
> 
> b Tests 4, 8 and 12 of t/modules/buffer.t
>  Not a regression
>  Tests 4, 8 and sometimes 12, always line 37
>  Relatively frequent (93 times) failures on platforms Solaris 10
>  and old SLES11 (114 times), RHEL 6 (88 times), but not on more modern
>  (and here faster) SLES 12 and RHEL 7.
>  Happens for all OpenSSL client and server
>  versions and all link types.
> 
> c Test 5 in t/modules/dav.t line 69:
>  Not a regression.
>  22 times: twice RHEL 6, 3 times RHEL 7 and 5 times SLES 11,
>  8 times SLES 12, twice SLES 15, twice Solaris 10.
>  Creation, modified and now times not in the correct order.
>  This seems to be a system issue, all tests done on NFS,
>  many tested on virtualized guests.
> 
> d Tests 45, 48, 51, 54 in t/modules/cgi.t line 232:
>  Not a regression
>  125 times once Solaris
>  Test checks log contents. Could be false positive due to
>  logs written to NFS.
> 
> Regards,
> 
> Rainer
> 
> GDB info (sporadic) Solaris shutdown crashes during OpenSSL shutdown in 
> mod_watchdog:
> 
> fedd7668 realfree (1c0268, 61, 60, 1bfc58, 0, 1c02d8) + 154
> fedd7d9c _free_unlocked (feeb92b0, 0, d86dc, feeb9330, feeb03d8, 1c99f8) + b0
> fedd7cd8 free     (1c99f8, fe9a3ad0, d871c, fe8e4ee0, feeb03d8, feeb3a20) + 24
> fe8e2390 OPENSSL_LH_free (1bce10, fe9a3ad0, feeb5900, 2, fe9a3ad0, 1cd450) + 
> 64
> fe8bcf44 err_cleanup (0, f8800, feeb5900, fe9ed05c, fe8db4f0, fe9ecf4c) + 94
> fe8dee54 OPENSSL_cleanup (1, fe9ed284, fe9d3598, fe9a3240, fe9ed25c, 
> fe9ed280) + 1e4
> fedc2374 _exithandle (feeb7500, feeb5900, 1c00, feeb9330, 24, 218910) + 40
> fedb0790 exit     (0, 218910, ff076cc8, 0, fce40200, 39b1a4) + 4
> fed62a18 clean_child_exit (0, 0, 0, 0, 0, 0) + 98
> fed62a3c just_die (f, 0, fcdfba70, 1, 0, 0) + 4
> fee4961c __sighndlr (f, 0, fcdfba70, fed62a38, 0, 1) + c
> fee3dce8 call_user_handler (f, 0, 0, 0, fce40200, fcdfba70) + 3b8
> fee3ded0 sigacthandler (f, 0, fcdfba70, 0, 0, 0) + 60
> --- called from signal handler with signal 15 (SIGTERM) ---
> fee4cdc0 __pollsys (fcdfbde8, 0, fcdfbe50, 0, 0, 0) + 8
> fede8590 pselect  (fcdfbde8, feeb4728, feeb4728, 0, fcdfbe50, 0) + 1c8
> fede8908 select   (0, 0, 0, 0, fcdfbeb8, f4240) + a0
> ff087d20 apr_sleep (0, 186a0, a0a84, a0a80, 0, 0) + 4c
> fe3e3030 wd_worker (fe3f9864, 3a40f8, 1, fcdfbf38, 5a1e9, d645b26e) + 344
> ff087274 dummy_worker (3a5790, fcdfc000, 0, 0, ff087268, 1) + c
> fee494f0 _lwp_start (0, 0, 0, 0, 0, 0)
> 
> Also crash in mod_watchdog but in a separet stack:
> 
> -----------------  lwp# 1 / thread# 1  --------------------
> ff29f134 apr_pool_destroy (3524f8, 200, ffbfef98, 34ba68, 2000, 199c40) + 14c
> fef629e0 clean_child_exit (7, 22f, 3, 3, 9, cc858) + 60
> fef62f2c child_main (fef7b93c, fef7b938, 9cac4, fef7b954, fef7b944, 9c274) + 
> 344
> fef635fc make_child (cc858, 6, 6, 3520c8, 1, 0) + 1d0
> fef645e4 prefork_run (0, ffbff160, ffbff148, fef7b94c, 9c274, fef7b95c) + 91c
> 0003a9c4 ap_run_mpm (a76e0, ce3b0, cc858, 9c0e4, 0, 1d30d0) + 54
> 00076224 main     (385b4, 9babc, 77168, 9c274, 9c260, a5768) + 9b4
> 00032200 _start   (0, 0, 0, 0, 0, 0) + 5c
> -----------------  lwp# 2 / thread# 2  --------------------
> ff042480 mutex_lock_impl (fcee0200, 0, 0, 0, fd6b7658, ff042a78) + 168
> fd6a65d8 __deregister_frame_info_bases (fd6b7688, 0, 0, 202, fd6b7670, 0) + d8
> fd6a0d80 ???????? (0, 1, fd6b7680, fd6b7a08, 0, fd6b7a0c)
> fd6a6b20 _fini    (ff3f418c, ff3f5b10, 2ae70, 0, ff3f48e8, 1821) + 4
> ff3c5a5c call_fini (ff3f418c, fe5e0018, fd6a6b1c, ff3f4380, ff3f4338, 
> ff3f48e8) + cc
> ff3c5c2c atexit_fini (ff3f418c, 2ed28, ff042cc0, ff3f48e8, fcee0200, 
> fe5e0018) + 78
> fefc2374 _exithandle (ff0b7500, ff0b5900, 1c00, ff0b9330, 24, 1d4088) + 40
> fefb0790 exit     (0, 1d4088, ff299ec8, 0, fcee0200, 34bacc) + 4
> fef62a18 clean_child_exit (0, 0, 0, 0, 0, 0) + 98
> fef62a3c just_die (f, 0, fcffba70, 1, 0, 0) + 4
> ff04961c __sighndlr (f, 0, fcffba70, fef62a38, 0, 1) + c
> ff03dce8 call_user_handler (f, 0, 0, 0, fcee0200, fcffba70) + 3b8
> ff03ded0 sigacthandler (f, 0, fcffba70, 0, 0, 0) + 60
> --- called from signal handler with signal 15 (SIGTERM) ---
> ff04cdc0 __pollsys (fcffbde8, 0, fcffbe50, 0, 0, 0) + 8
> fefe8590 pselect  (fcffbde8, ff0b4728, ff0b4728, 0, fcffbe50, 0) + 1c8
> fefe8908 select   (0, 0, 0, 0, fcffbeb8, f4240) + a0
> ff2ab9fc apr_sleep (0, 186a0, a1644, a1640, 0, 0) + 4c
> fe573030 wd_worker (fe589864, 34e4f8, 1, fcffbf38, 5a1ee, 6331aff1) + 344
> ff2aaf60 dummy_worker (3501b8, fcffc000, 0, 0, ff2aaf54, 1) + c
> ff0494f0 _lwp_start (0, 0, 0, 0, 0, 0)

Reply via email to