In Yann's v3 of the patch, it triggers only when I stop the server while the test case is ongoing,
Thread 2 Crashed: 0 libsystem_kernel.dylib 0x00007fff202e292e __pthread_kill + 10 1 libsystem_pthread.dylib 0x00007fff203115bd pthread_kill + 263 2 libsystem_c.dylib 0x00007fff202664ab __abort + 139 3 libsystem_c.dylib 0x00007fff20266420 abort + 135 4 httpd 0x000000010625bb72 ap_log_assert + 130 (log.c:1649) 5 mod_mpm_event.so 0x0000000106b73549 close_socket_nonblocking + 217 (event.c:569) 6 mod_mpm_event.so 0x0000000106b731f2 kill_connection + 64 (event.c:834) [inlined] 7 mod_mpm_event.so 0x0000000106b731f2 start_lingering_close_blocking + 178 (event.c:860) 8 mod_mpm_event.so 0x0000000106b72e52 process_socket + 1282 (event.c:1258) 9 mod_mpm_event.so 0x0000000106b72819 worker_thread + 809 (event.c:2390) 10 libapr-1.0.dylib 0x0000000106412341 dummy_worker + 17 (thread.c:147) 11 libsystem_pthread.dylib 0x00007fff203118fc _pthread_start + 224 12 libsystem_pthread.dylib 0x00007fff2030d443 thread_start + 15 [Tue Jul 13 14:12:18.800125 2021] [mpm_event:trace4] [pid 47786:tid 123145484374016] event.c(563): closing socket 15/7f95ba07b4a0 [Tue Jul 13 14:12:19.551039 2021] [mpm_event:trace4] [pid 47787:tid 123145488666624] event.c(1322): closing listeners (connection_count=0) [Tue Jul 13 14:12:19.551146 2021] [mpm_event:trace4] [pid 47786:tid 123145497251840] event.c(1322): closing listeners (connection_count=1) [Tue Jul 13 14:12:19.551176 2021] [mpm_event:trace4] [pid 47788:tid 123145488666624] event.c(1322): closing listeners (connection_count=0) [Tue Jul 13 14:12:19.551209 2021] [mpm_event:trace4] [pid 47786:tid 4476272128] event.c(563): closing socket 15/7f95ba07b4b0 [Tue Jul 13 14:12:19.552305 2021] [mpm_event:trace4] [pid 47792:tid 123145488666624] event.c(1322): closing listeners (connection_count=0) [Tue Jul 13 14:12:19.805283 2021] [mpm_event:trace4] [pid 47786:tid 123145484910592] event.c(563): closing socket -1/7f95ba07b4b0 [Tue Jul 13 14:12:19.805289 2021] [mpm_event:error] [pid 47786:tid 123145484910592] (9)Bad file descriptor: AH00468: error closing socket -1/7f95ba07b4b0 You can see that socket 7f95ba07b4b0 is closed twice. I attach my log and v5 patch (which is Yann's plus some extra log around closing).
error_log
Description: Binary data
event_ka_no_lingerv5.diff
Description: Binary data
> Am 13.07.2021 um 16:03 schrieb Stefan Eissing <[email protected]>: > > Wait a minute...need to confirm that I am not lying and used an older patch > version... > >> Am 13.07.2021 um 16:01 schrieb Stefan Eissing <[email protected]>: >> >> Found one condition where the assertion failure happens. The test case I >> have does this: >> >> sock = socket.create_connection((host, int(env.https_port))) >> try: >> sock.settimeout(0.5) >> sock.recv(1024) >> assert False >> except Exception as ex: >> print(f"as expected: {ex}") >> sock.close() >> >> Which is all that's needed to trigger the assertion. The server is not >> stopped. >> >> The log for this thread only gives: >> [Tue Jul 13 13:51:29.476747 2021] [mpm_event:trace4] [pid 31582:tid >> 123145480441856] event.c(858): [client 127.0.0.1:51768] lingering close in >> state 5 >> [Tue Jul 13 13:51:29.476769 2021] [mpm_event:trace4] [pid 31582:tid >> 123145480441856] event.c(833): [remote 127.0.0.1:51768] kill connection in >> state 5 >> [Tue Jul 13 13:51:29.476775 2021] [mpm_event:trace4] [pid 31582:tid >> 123145480441856] event.c(563): closing socket -1/7fa24681d0b0 from >> process_socket >> [Tue Jul 13 13:51:29.476779 2021] [mpm_event:error] [pid 31582:tid >> 123145480441856] (9)Bad file descriptor: AH00468: error closing socket >> -1/7fa24681d0b0 from process_socket >> [Tue Jul 13 13:51:29.476799 2021] [core:crit] [pid 31582:tid >> 123145480441856] AH00102: [Tue Jul 13 13:51:29 2021] file event.c, line 569, >> assertion "0" failed >> >> I added a trace4 "closing socket" before the apr_socket_close(), grepping >> for that in the log shows: >> [Tue Jul 13 13:51:28.467677 2021] [mpm_event:trace4] [pid 31582:tid >> 123145479905280] event.c(563): closing socket 15/7fa24681d0a0 from >> process_lingering_close >> [Tue Jul 13 13:51:29.476775 2021] [mpm_event:trace4] [pid 31582:tid >> 123145480441856] event.c(563): closing socket -1/7fa24681d0b0 from >> process_socket >> [Tue Jul 13 13:51:29.476779 2021] [mpm_event:error] [pid 31582:tid >> 123145480441856] (9)Bad file descriptor: AH00468: error closing socket >> -1/7fa24681d0b0 from process_socket >> [Tue Jul 13 13:51:29.984132 2021] [mpm_event:trace4] [pid 31584:tid >> 123145470783488] event.c(563): closing socket -1/7fa24681f0b0 from >> process_socket >> [Tue Jul 13 13:51:29.984138 2021] [mpm_event:error] [pid 31584:tid >> 123145470783488] (9)Bad file descriptor: AH00468: error closing socket >> -1/7fa24681f0b0 from process_socket >> >> >> >> Thread 21 Crashed: >> 0 libsystem_kernel.dylib 0x00007fff202e292e __pthread_kill + 10 >> 1 libsystem_pthread.dylib 0x00007fff203115bd pthread_kill + 263 >> 2 libsystem_c.dylib 0x00007fff202664ab __abort + 139 >> 3 libsystem_c.dylib 0x00007fff20266420 abort + 135 >> 4 httpd 0x000000010dbfdb72 ap_log_assert + 130 >> (log.c:1649) >> 5 mod_mpm_event.so 0x000000010e5154bf >> close_socket_nonblocking_ex + 223 (event.c:569) >> 6 mod_mpm_event.so 0x000000010e515158 kill_connection_ex + >> 67 (event.c:835) [inlined] >> 7 mod_mpm_event.so 0x000000010e515158 >> start_lingering_close_blocking_ex + 184 (event.c:862) >> 8 mod_mpm_event.so 0x000000010e514da9 process_socket + >> 1289 (event.c:1261) >> 9 mod_mpm_event.so 0x000000010e514769 worker_thread + 809 >> (event.c:2394) >> 10 libapr-1.0.dylib 0x000000010ddb9341 dummy_worker + 17 >> (thread.c:147) >> 11 libsystem_pthread.dylib 0x00007fff203118fc _pthread_start + 224 >> 12 libsystem_pthread.dylib 0x00007fff2030d443 thread_start + 15 >> >> >> >> - Stefan >> >>> Am 13.07.2021 um 12:39 schrieb Stefan Eissing >>> <[email protected]>: >>> >>> Coming back to this: I vote for applying this to trunk. On a fresh setup, I >>> do not see and assert failures in my test. >>> >>> - Stefan >>> >>>> Am 08.07.2021 um 20:17 schrieb Yann Ylavic <[email protected]>: >>>> >>>> On Thu, Jul 8, 2021 at 4:21 PM Stefan Eissing >>>> <[email protected]> wrote: >>>>> >>>>> I see many of those: >>>>> [Thu Jul 08 14:16:55.301670 2021] [mpm_event:error] [pid 81101:tid >>>>> 123145411510272] (9)Bad file descriptor: AH00468: error closing socket >>>>> -1/7ff9cf0086b0 from process_socket >>>>> >>>>> which come from event.c#1263: rc = start_lingering_close_blocking(cs); >>>>> calling event.c#864: kill_connection_ex(cs, from); >>>>> and event.c#837: close_socket_nonblocking_ex(cs->pfd.desc.s, from); >>>> >>>> OK, so I think this could be addressed by the attached patch. >>>> This is the same as v0 plus some changes in ap_start_lingering_close() >>>> (from server/connection.c) to not close the socket on error. >>>> ap_start_lingering_close() did not close the socket consistently, so >>>> the caller had to call apr_socket_close() anyway and sometimes fail.. >>>> The close on failure is moved to ap_lingering_close() which is the one >>>> that should care. >>>> >>>> Still the AP_DEBUG_ASSERT(0) triggering with this? >>>> <event_ka_no_lingerv3.diff> >>> >> >
