This crash happens with the latest git source as of today, at version b367b7c5d.
I had this crash happen twice in a few hours after running the same environment
for more than a month without any change, so I guess it's not easy to reproduce.
The first time it crashed I got these entries in syslog for the process that
crashed:
----------------------
Jun 30 02:29:10 node15 ./opensips[3573]: CRITICAL:core:send_fd: sendmsg failed
on 172: Bad file descriptor
Jun 30 02:29:10 node15 ./opensips[3573]: ERROR:core:send2worker: send_fd failed
Jun 30 02:29:10 node15 ./opensips[3573]: ERROR:core:handle_tcpconn_ev: no TCP
workers available
Jun 30 02:29:10 node15 ./opensips[3573]: CRITICAL:core:io_watch_del:
Jun 30 02:29:10 node15 ./opensips[3573]: CRITICAL:core:io_watch_del:
Jun 30 02:29:10 node15 ./opensips[3573]: CRITICAL:core:io_watch_del:
Jun 30 02:29:10 node15 ./opensips[3573]: CRITICAL:core:io_watch_del: [TCP_main]
check failed after successful fd del (fd=221,flags=0, sflags=1) over map
(fd=-1,type=0,data=(nil),flags=536870912) erase=1
Jun 30 02:29:10 node15 ./opensips[3573]: CRITICAL:core:io_watch_add:
Jun 30 02:29:10 node15 ./opensips[3573]: CRITICAL:core:io_watch_add:
Jun 30 02:29:10 node15 ./opensips[3573]: CRITICAL:core:io_watch_add:
Jun 30 02:29:10 node15 ./opensips[3573]: CRITICAL:core:io_watch_add: [TCP_main]
check failed after successful fd add
(fd=222,type=19,data=0x7ff16c0da018,flags=1) already=0
Jun 30 02:29:10 node15 ./opensips[3573]: CRITICAL:core:io_watch_del:
Jun 30 02:29:10 node15 ./opensips[3573]: CRITICAL:core:io_watch_del:
Jun 30 02:29:10 node15 ./opensips[3573]: CRITICAL:core:io_watch_del:
Jun 30 02:29:10 node15 ./opensips[3573]: CRITICAL:core:io_watch_del: [TCP_main]
check failed after successful fd del (fd=222,flags=16, sflags=3) over map
(fd=-1,type=0,data=(nil),flags=0) erase=1
Jun 30 02:29:10 node15 ./opensips[3573]: CRITICAL:core:io_watch_add:
Jun 30 02:29:10 node15 ./opensips[3573]: CRITICAL:core:io_watch_add:
Jun 30 02:29:10 node15 ./opensips[3573]: CRITICAL:core:io_watch_add:
Jun 30 02:29:10 node15 ./opensips[3573]: CRITICAL:core:io_watch_add:
Jun 30 02:29:10 node15 ./opensips[3573]: CRITICAL:core:io_watch_add: [TCP_main]
check failed after failed fd add (fd=220,type=19,data=0x7ff16c1042a8,flags=1)
already=-1
Jun 30 02:29:10 node15 ./opensips[3573]: CRITICAL:core:io_watch_add:
Jun 30 02:29:10 node15 ./opensips[3573]: CRITICAL:core:io_watch_add:
Jun 30 02:29:10 node15 ./opensips[3573]: CRITICAL:core:io_watch_add:
Jun 30 02:29:10 node15 ./opensips[3573]: CRITICAL:core:io_watch_add: [TCP_main]
check failed after successful fd add
(fd=222,type=19,data=0x7ff16c1042a8,flags=1) already=0
Jun 30 02:29:10 node15 ./opensips[3573]: CRITICAL:core:io_watch_del:
Jun 30 02:29:10 node15 ./opensips[3573]: CRITICAL:core:io_watch_del:
Jun 30 02:29:10 node15 ./opensips[3573]: CRITICAL:core:io_watch_del:
Jun 30 02:29:10 node15 ./opensips[3573]: CRITICAL:core:io_watch_del: [TCP_main]
check failed after successful fd del (fd=222,flags=16, sflags=3) over map
(fd=-1,type=0,data=(nil),flags=0) erase=1
Jun 30 02:29:14 node15 ./opensips[3573]: CRITICAL:core:fm_free_dbg: freeing
already freed shm pointer (0x7ff16c1042a8), first free: net/net_tcp.c:
_tcpconn_rm(747) - aborting!
Jun 30 02:29:16 node15 ./opensips[3552]: INFO:core:handle_sigs: child process
3573 exited by a signal 6
----------------------
and the backtrace reads:
(gdb) bt
#0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
#1 0x00007ff18de72535 in __GI_abort () at abort.c:79
#2 0x000056070c29962f in fm_free_dbg (fm=<optimized out>, p=<optimized out>,
file=<optimized out>,
func=<optimized out>, line=<optimized out>) at mem/f_malloc_dyn.h:221
#3 0x000056070c4b34ee in _shm_free (file=0x56070c6485a0 "net/net_tcp.c",
line=747, function=<synthetic pointer>,
ptr=0x7ff16c1042a8) at net/../mem/shm_mem.h:448
#4 _tcpconn_rm (c=c@entry=0x7ff16c1042a8) at net/net_tcp.c:747
#5 0x000056070c4d221e in __tcpconn_lifetime (force=0) at net/net_tcp.c:1608
#6 tcp_main_server () at net/net_tcp.c:1695
#7 0x000056070c4dc41a in tcp_start_listener () at net/net_tcp.c:2114
#8 0x000056070bfb5eba in main_loop () at main.c:813
#9 main (argc=<optimized out>, argv=<optimized out>) at main.c:1480
At the same time a second process crashed with the following in syslog (this
looks like the main process trying to do cleanup):
-----------------------
Jun 30 02:29:16 node15 ./opensips[3552]: INFO:core:handle_sigs: child process
3573 exited by a signal 6
Jun 30 02:29:16 node15 ./opensips[3552]: INFO:core:handle_sigs: core was
generated
Jun 30 02:29:16 node15 ./opensips[3552]: INFO:core:handle_sigs: terminating due
to SIGCHLD
Jun 30 02:29:16 node15 ./opensips[3552]: INFO:core:shutdown_opensips: process
1(3554) [JSON-RPC sender] terminated, still waiting for 15 more
Jun 30 02:29:16 node15 ./opensips[3552]: INFO:core:shutdown_opensips: process
2(3555) [MI Datagram] terminated, still waiting for 14 more
Jun 30 02:29:16 node15 ./opensips[3552]: INFO:core:shutdown_opensips: process
3(3556) [MI FIFO] terminated, still waiting for 13 more
Jun 30 02:29:16 node15 ./opensips[3552]: INFO:core:shutdown_opensips: process
4(3557) [time_keeper] terminated, still waiting for 12 more
Jun 30 02:29:16 node15 ./opensips[3552]: INFO:core:shutdown_opensips: process
5(3558) [timer] terminated, still waiting for 11 more
Jun 30 02:29:16 node15 ./opensips[3552]: INFO:core:shutdown_opensips: process
12(3565) [TCP receiver] terminated, still waiting for 10 more
Jun 30 02:29:16 node15 ./opensips[3552]: INFO:core:shutdown_opensips: process
13(3566) [TCP receiver] terminated, still waiting for 9 more
Jun 30 02:29:16 node15 ./opensips[3552]: INFO:core:shutdown_opensips: process
11(3564) [TCP receiver] terminated, still waiting for 8 more
Jun 30 02:29:16 node15 ./opensips[3552]: INFO:core:shutdown_opensips: process
15(3570) [TCP receiver] terminated, still waiting for 7 more
Jun 30 02:29:19 node15 ./opensips[3552]: INFO:core:shutdown_opensips: process
9(3562) [SIP receiver udp:81.23.228.160:5060] terminated, still waiting for 6
more
Jun 30 02:29:21 node15 ./opensips[3552]: INFO:core:cleanup: cleanup
Jun 30 02:29:21 node15 ./opensips[3552]:
INFO:db_mysql:switch_state_to_disconnected: disconnect event for 0x60b00000b9e0
Jun 30 02:29:21 node15 ./opensips[3552]: INFO:db_mysql:reset_all_statements:
resetting all statements on connection: (0x60d000005b40) 0x60b00000b9e0
Jun 30 02:29:21 node15 ./opensips[3552]: INFO:db_mysql:connect_with_retry:
re-connected successful for 0x60b00000b9e0
Jun 30 02:29:21 node15 ./opensips[3552]: NOTICE:presence:destroy: destroy
module ...
Jun 30 02:29:21 node15 ./opensips[3552]: NOTICE:event_jsonrpc:destroy: destroy
module ...
Jun 30 02:29:21 node15 ./opensips[3552]: CRITICAL:core:fm_free_dbg: freeing
already freed shm pointer (0x7ff16c10a6f0), first free: net/net_tcp.c:
_tcpconn_rm(747) - aborting!
-----------------------
and this backtrace:
(gdb) bt
#0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
#1 0x00007ff18de72535 in __GI_abort () at abort.c:79
#2 0x000056070c29962f in fm_free_dbg (fm=<optimized out>, p=<optimized out>,
file=<optimized out>,
func=<optimized out>, line=<optimized out>) at mem/f_malloc_dyn.h:221
#3 0x000056070c4b34ee in _shm_free (file=0x56070c6485a0 "net/net_tcp.c",
line=747, function=<synthetic pointer>,
ptr=0x7ff16c10a6f0) at net/../mem/shm_mem.h:448
#4 _tcpconn_rm (c=0x7ff16c10a6f0) at net/net_tcp.c:747
#5 0x000056070c4d5f12 in __tcpconn_lifetime (force=1) at net/net_tcp.c:1608
#6 tcp_destroy () at net/net_tcp.c:1811
#7 0x000056070c1e551a in cleanup (show_status=show_status@entry=1) at
main.c:360
#8 0x000056070c1e69aa in shutdown_opensips (status=status@entry=134) at
main.c:522
#9 0x000056070c1e7ca0 in handle_sigs () at main.c:605
#10 0x000056070bfb65fe in main_loop () at main.c:867
#11 main (argc=<optimized out>, argv=<optimized out>) at main.c:1480
The second time syslog didn't have much detail except the critical double free
error. This time it also had 2 crashed processes. The first one had this:
------------------------
Jun 30 03:11:47 node15 ./opensips[18284]: CRITICAL:core:fm_free_dbg: freeing
already freed shm pointer (0x7ff8ed303b38), first free: net/net_tcp.c:
_tcpconn_rm(747) - aborting!
------------------------
and backtrace:
(gdb) bt
#0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
#1 0x00007ff90f1f2535 in __GI_abort () at abort.c:79
#2 0x00005584676ff62f in fm_free_dbg (fm=<optimized out>, p=<optimized out>,
file=<optimized out>,
func=<optimized out>, line=<optimized out>) at mem/f_malloc_dyn.h:221
#3 0x00005584679194ee in _shm_free (file=0x558467aae5a0 "net/net_tcp.c",
line=747, function=<synthetic pointer>,
ptr=0x7ff8ed303b38) at net/../mem/shm_mem.h:448
#4 _tcpconn_rm (c=c@entry=0x7ff8ed303b38) at net/net_tcp.c:747
#5 0x0000558467919d4a in tcpconn_destroy (tcpconn=0x7ff8ed303b38) at
net/net_tcp.c:1005
#6 0x000055846792a707 in handle_tcp_worker (fd_i=<optimized out>,
tcp_c=0x6160000039c8) at net/net_tcp.c:1335
#7 handle_io (fm=<optimized out>, idx=idx@entry=22,
event_type=event_type@entry=1) at net/net_tcp.c:1525
#8 0x0000558467937831 in io_wait_loop_epoll (h=<optimized out>, t=5, repeat=0)
at net/../io_wait_loop.h:280
#9 tcp_main_server () at net/net_tcp.c:1695
#10 0x000055846794241a in tcp_start_listener () at net/net_tcp.c:2114
#11 0x000055846741beba in main_loop () at main.c:813
#12 main (argc=<optimized out>, argv=<optimized out>) at main.c:1480
and the main process had:
---------------------
Jun 30 03:11:49 node15 ./opensips[18255]: INFO:core:handle_sigs: child process
18284 exited by a signal 6
Jun 30 03:11:49 node15 ./opensips[18255]: INFO:core:handle_sigs: core was
generated
Jun 30 03:11:49 node15 ./opensips[18255]: INFO:core:handle_sigs: terminating
due to SIGCHLD
Jun 30 03:11:49 node15 ./opensips[18255]: INFO:core:shutdown_opensips: process
2(18258) [MI Datagram] terminated, still waiting for 15 more
Jun 30 03:11:49 node15 ./opensips[18255]: INFO:core:shutdown_opensips: process
1(18257) [JSON-RPC sender] terminated, still waiting for 14 more
Jun 30 03:11:49 node15 ./opensips[18255]: INFO:core:shutdown_opensips: process
12(18273) [TCP receiver] terminated, still waiting for 13 more
Jun 30 03:11:49 node15 ./opensips[18255]: INFO:core:shutdown_opensips: process
14(18278) [TCP receiver] terminated, still waiting for 12 more
Jun 30 03:11:49 node15 ./opensips[18255]: INFO:core:shutdown_opensips: process
3(18259) [MI FIFO] terminated, still waiting for 11 more
Jun 30 03:11:49 node15 ./opensips[18255]: INFO:core:shutdown_opensips: process
4(18260) [time_keeper] terminated, still waiting for 10 more
Jun 30 03:11:49 node15 ./opensips[18255]: INFO:core:shutdown_opensips: process
13(18276) [TCP receiver] terminated, still waiting for 9 more
Jun 30 03:11:49 node15 ./opensips[18255]: INFO:core:shutdown_opensips: process
5(18261) [timer] terminated, still waiting for 8 more
Jun 30 03:11:54 node15 ./opensips[18255]: INFO:core:cleanup: cleanup
Jun 30 03:11:56 node15 ./opensips[18255]:
INFO:db_mysql:switch_state_to_disconnected: disconnect event for 0x60b00002fed0
Jun 30 03:11:56 node15 ./opensips[18255]: INFO:db_mysql:reset_all_statements:
resetting all statements on connection: (0x60d000005b40) 0x60b00002fed0
Jun 30 03:11:56 node15 ./opensips[18255]: INFO:db_mysql:connect_with_retry:
re-connected successful for 0x60b00002fed0
Jun 30 03:11:56 node15 ./opensips[18255]:
INFO:db_mysql:db_mysql_do_prepared_query: reconnected to mysql server ->
re-init the statement
Jun 30 03:11:56 node15 ./opensips[18255]: NOTICE:presence:destroy: destroy
module ...
Jun 30 03:11:56 node15 ./opensips[18255]: NOTICE:event_jsonrpc:destroy: destroy
module ...
Jun 30 03:11:56 node15 ./opensips[18255]: CRITICAL:core:fm_free_dbg: freeing
already freed shm pointer (0x7ff8ed31b098), first free: net/net_tcp.c:
_tcpconn_rm(747) - aborting!
---------------------
and backtrace:
(gdb) bt
#0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
#1 0x00007ff90f1f2535 in __GI_abort () at abort.c:79
#2 0x00005584676ff62f in fm_free_dbg (fm=<optimized out>, p=<optimized out>,
file=<optimized out>,
func=<optimized out>, line=<optimized out>) at mem/f_malloc_dyn.h:221
#3 0x00005584679194ee in _shm_free (file=0x558467aae5a0 "net/net_tcp.c",
line=747, function=<synthetic pointer>,
ptr=0x7ff8ed31b098) at net/../mem/shm_mem.h:448
#4 _tcpconn_rm (c=0x7ff8ed31b098) at net/net_tcp.c:747
#5 0x000055846793bf12 in __tcpconn_lifetime (force=1) at net/net_tcp.c:1608
#6 tcp_destroy () at net/net_tcp.c:1811
#7 0x000055846764b51a in cleanup (show_status=show_status@entry=1) at
main.c:360
#8 0x000055846764c9aa in shutdown_opensips (status=status@entry=134) at
main.c:522
#9 0x000055846764dca0 in handle_sigs () at main.c:605
#10 0x000055846741c5fe in main_loop () at main.c:867
#11 main (argc=<optimized out>, argv=<optimized out>) at main.c:1480
--
Dan
_______________________________________________
Devel mailing list
[email protected]
http://lists.opensips.org/cgi-bin/mailman/listinfo/devel