This crash happens with the latest git source as of today, at version b367b7c5d.

I had this crash happen twice in a few hours after running the same environment 
for more than a month without any change, so I guess it's not easy to reproduce.

The first time it crashed I got these entries in syslog for the process that 
crashed:

----------------------
Jun 30 02:29:10 node15 ./opensips[3573]: CRITICAL:core:send_fd: sendmsg failed 
on 172: Bad file descriptor
Jun 30 02:29:10 node15 ./opensips[3573]: ERROR:core:send2worker: send_fd failed
Jun 30 02:29:10 node15 ./opensips[3573]: ERROR:core:handle_tcpconn_ev: no TCP 
workers available
Jun 30 02:29:10 node15 ./opensips[3573]: CRITICAL:core:io_watch_del:
Jun 30 02:29:10 node15 ./opensips[3573]: CRITICAL:core:io_watch_del:
Jun 30 02:29:10 node15 ./opensips[3573]: CRITICAL:core:io_watch_del:
Jun 30 02:29:10 node15 ./opensips[3573]: CRITICAL:core:io_watch_del: [TCP_main] 
check failed after successful fd del (fd=221,flags=0, sflags=1) over map 
(fd=-1,type=0,data=(nil),flags=536870912) erase=1
Jun 30 02:29:10 node15 ./opensips[3573]: CRITICAL:core:io_watch_add:
Jun 30 02:29:10 node15 ./opensips[3573]: CRITICAL:core:io_watch_add:
Jun 30 02:29:10 node15 ./opensips[3573]: CRITICAL:core:io_watch_add:
Jun 30 02:29:10 node15 ./opensips[3573]: CRITICAL:core:io_watch_add: [TCP_main] 
check failed after successful fd add 
(fd=222,type=19,data=0x7ff16c0da018,flags=1) already=0
Jun 30 02:29:10 node15 ./opensips[3573]: CRITICAL:core:io_watch_del:
Jun 30 02:29:10 node15 ./opensips[3573]: CRITICAL:core:io_watch_del:
Jun 30 02:29:10 node15 ./opensips[3573]: CRITICAL:core:io_watch_del:
Jun 30 02:29:10 node15 ./opensips[3573]: CRITICAL:core:io_watch_del: [TCP_main] 
check failed after successful fd del (fd=222,flags=16, sflags=3) over map 
(fd=-1,type=0,data=(nil),flags=0) erase=1
Jun 30 02:29:10 node15 ./opensips[3573]: CRITICAL:core:io_watch_add:
Jun 30 02:29:10 node15 ./opensips[3573]: CRITICAL:core:io_watch_add:
Jun 30 02:29:10 node15 ./opensips[3573]: CRITICAL:core:io_watch_add:
Jun 30 02:29:10 node15 ./opensips[3573]: CRITICAL:core:io_watch_add:
Jun 30 02:29:10 node15 ./opensips[3573]: CRITICAL:core:io_watch_add: [TCP_main] 
check failed after failed fd add (fd=220,type=19,data=0x7ff16c1042a8,flags=1) 
already=-1
Jun 30 02:29:10 node15 ./opensips[3573]: CRITICAL:core:io_watch_add:
Jun 30 02:29:10 node15 ./opensips[3573]: CRITICAL:core:io_watch_add:
Jun 30 02:29:10 node15 ./opensips[3573]: CRITICAL:core:io_watch_add:
Jun 30 02:29:10 node15 ./opensips[3573]: CRITICAL:core:io_watch_add: [TCP_main] 
check failed after successful fd add 
(fd=222,type=19,data=0x7ff16c1042a8,flags=1) already=0
Jun 30 02:29:10 node15 ./opensips[3573]: CRITICAL:core:io_watch_del:
Jun 30 02:29:10 node15 ./opensips[3573]: CRITICAL:core:io_watch_del:
Jun 30 02:29:10 node15 ./opensips[3573]: CRITICAL:core:io_watch_del:
Jun 30 02:29:10 node15 ./opensips[3573]: CRITICAL:core:io_watch_del: [TCP_main] 
check failed after successful fd del (fd=222,flags=16, sflags=3) over map 
(fd=-1,type=0,data=(nil),flags=0) erase=1
Jun 30 02:29:14 node15 ./opensips[3573]: CRITICAL:core:fm_free_dbg: freeing 
already freed shm pointer (0x7ff16c1042a8), first free: net/net_tcp.c: 
_tcpconn_rm(747) - aborting!
Jun 30 02:29:16 node15 ./opensips[3552]: INFO:core:handle_sigs: child process 
3573 exited by a signal 6
----------------------

and the backtrace reads:

(gdb) bt
#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
#1  0x00007ff18de72535 in __GI_abort () at abort.c:79
#2  0x000056070c29962f in fm_free_dbg (fm=<optimized out>, p=<optimized out>, 
file=<optimized out>,
    func=<optimized out>, line=<optimized out>) at mem/f_malloc_dyn.h:221
#3  0x000056070c4b34ee in _shm_free (file=0x56070c6485a0 "net/net_tcp.c", 
line=747, function=<synthetic pointer>,
    ptr=0x7ff16c1042a8) at net/../mem/shm_mem.h:448
#4  _tcpconn_rm (c=c@entry=0x7ff16c1042a8) at net/net_tcp.c:747
#5  0x000056070c4d221e in __tcpconn_lifetime (force=0) at net/net_tcp.c:1608
#6  tcp_main_server () at net/net_tcp.c:1695
#7  0x000056070c4dc41a in tcp_start_listener () at net/net_tcp.c:2114
#8  0x000056070bfb5eba in main_loop () at main.c:813
#9  main (argc=<optimized out>, argv=<optimized out>) at main.c:1480


At the same time a second process crashed with the following in syslog (this 
looks like the main process trying to do cleanup):

-----------------------
Jun 30 02:29:16 node15 ./opensips[3552]: INFO:core:handle_sigs: child process 
3573 exited by a signal 6
Jun 30 02:29:16 node15 ./opensips[3552]: INFO:core:handle_sigs: core was 
generated
Jun 30 02:29:16 node15 ./opensips[3552]: INFO:core:handle_sigs: terminating due 
to SIGCHLD
Jun 30 02:29:16 node15 ./opensips[3552]: INFO:core:shutdown_opensips: process 
1(3554) [JSON-RPC sender] terminated, still waiting for 15 more
Jun 30 02:29:16 node15 ./opensips[3552]: INFO:core:shutdown_opensips: process 
2(3555) [MI Datagram] terminated, still waiting for 14 more
Jun 30 02:29:16 node15 ./opensips[3552]: INFO:core:shutdown_opensips: process 
3(3556) [MI FIFO] terminated, still waiting for 13 more
Jun 30 02:29:16 node15 ./opensips[3552]: INFO:core:shutdown_opensips: process 
4(3557) [time_keeper] terminated, still waiting for 12 more
Jun 30 02:29:16 node15 ./opensips[3552]: INFO:core:shutdown_opensips: process 
5(3558) [timer] terminated, still waiting for 11 more
Jun 30 02:29:16 node15 ./opensips[3552]: INFO:core:shutdown_opensips: process 
12(3565) [TCP receiver] terminated, still waiting for 10 more
Jun 30 02:29:16 node15 ./opensips[3552]: INFO:core:shutdown_opensips: process 
13(3566) [TCP receiver] terminated, still waiting for 9 more
Jun 30 02:29:16 node15 ./opensips[3552]: INFO:core:shutdown_opensips: process 
11(3564) [TCP receiver] terminated, still waiting for 8 more
Jun 30 02:29:16 node15 ./opensips[3552]: INFO:core:shutdown_opensips: process 
15(3570) [TCP receiver] terminated, still waiting for 7 more
Jun 30 02:29:19 node15 ./opensips[3552]: INFO:core:shutdown_opensips: process 
9(3562) [SIP receiver udp:81.23.228.160:5060] terminated, still waiting for 6 
more
Jun 30 02:29:21 node15 ./opensips[3552]: INFO:core:cleanup: cleanup
Jun 30 02:29:21 node15 ./opensips[3552]: 
INFO:db_mysql:switch_state_to_disconnected: disconnect event for 0x60b00000b9e0
Jun 30 02:29:21 node15 ./opensips[3552]: INFO:db_mysql:reset_all_statements: 
resetting all statements on connection: (0x60d000005b40) 0x60b00000b9e0
Jun 30 02:29:21 node15 ./opensips[3552]: INFO:db_mysql:connect_with_retry: 
re-connected successful for 0x60b00000b9e0
Jun 30 02:29:21 node15 ./opensips[3552]: NOTICE:presence:destroy: destroy 
module ...
Jun 30 02:29:21 node15 ./opensips[3552]: NOTICE:event_jsonrpc:destroy: destroy 
module ...
Jun 30 02:29:21 node15 ./opensips[3552]: CRITICAL:core:fm_free_dbg: freeing 
already freed shm pointer (0x7ff16c10a6f0), first free: net/net_tcp.c: 
_tcpconn_rm(747) - aborting!
-----------------------

and this backtrace:

(gdb) bt
#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
#1  0x00007ff18de72535 in __GI_abort () at abort.c:79
#2  0x000056070c29962f in fm_free_dbg (fm=<optimized out>, p=<optimized out>, 
file=<optimized out>,
    func=<optimized out>, line=<optimized out>) at mem/f_malloc_dyn.h:221
#3  0x000056070c4b34ee in _shm_free (file=0x56070c6485a0 "net/net_tcp.c", 
line=747, function=<synthetic pointer>,
    ptr=0x7ff16c10a6f0) at net/../mem/shm_mem.h:448
#4  _tcpconn_rm (c=0x7ff16c10a6f0) at net/net_tcp.c:747
#5  0x000056070c4d5f12 in __tcpconn_lifetime (force=1) at net/net_tcp.c:1608
#6  tcp_destroy () at net/net_tcp.c:1811
#7  0x000056070c1e551a in cleanup (show_status=show_status@entry=1) at 
main.c:360
#8  0x000056070c1e69aa in shutdown_opensips (status=status@entry=134) at 
main.c:522
#9  0x000056070c1e7ca0 in handle_sigs () at main.c:605
#10 0x000056070bfb65fe in main_loop () at main.c:867
#11 main (argc=<optimized out>, argv=<optimized out>) at main.c:1480


The second time syslog didn't have much detail except the critical double free 
error. This time it also had 2 crashed processes. The first one had this:

------------------------
Jun 30 03:11:47 node15 ./opensips[18284]: CRITICAL:core:fm_free_dbg: freeing 
already freed shm pointer (0x7ff8ed303b38), first free: net/net_tcp.c: 
_tcpconn_rm(747) - aborting!
------------------------

and backtrace:

(gdb) bt
#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
#1  0x00007ff90f1f2535 in __GI_abort () at abort.c:79
#2  0x00005584676ff62f in fm_free_dbg (fm=<optimized out>, p=<optimized out>, 
file=<optimized out>,
    func=<optimized out>, line=<optimized out>) at mem/f_malloc_dyn.h:221
#3  0x00005584679194ee in _shm_free (file=0x558467aae5a0 "net/net_tcp.c", 
line=747, function=<synthetic pointer>,
    ptr=0x7ff8ed303b38) at net/../mem/shm_mem.h:448
#4  _tcpconn_rm (c=c@entry=0x7ff8ed303b38) at net/net_tcp.c:747
#5  0x0000558467919d4a in tcpconn_destroy (tcpconn=0x7ff8ed303b38) at 
net/net_tcp.c:1005
#6  0x000055846792a707 in handle_tcp_worker (fd_i=<optimized out>, 
tcp_c=0x6160000039c8) at net/net_tcp.c:1335
#7  handle_io (fm=<optimized out>, idx=idx@entry=22, 
event_type=event_type@entry=1) at net/net_tcp.c:1525
#8  0x0000558467937831 in io_wait_loop_epoll (h=<optimized out>, t=5, repeat=0) 
at net/../io_wait_loop.h:280
#9  tcp_main_server () at net/net_tcp.c:1695
#10 0x000055846794241a in tcp_start_listener () at net/net_tcp.c:2114
#11 0x000055846741beba in main_loop () at main.c:813
#12 main (argc=<optimized out>, argv=<optimized out>) at main.c:1480


and the main process had:

---------------------
Jun 30 03:11:49 node15 ./opensips[18255]: INFO:core:handle_sigs: child process 
18284 exited by a signal 6
Jun 30 03:11:49 node15 ./opensips[18255]: INFO:core:handle_sigs: core was 
generated
Jun 30 03:11:49 node15 ./opensips[18255]: INFO:core:handle_sigs: terminating 
due to SIGCHLD
Jun 30 03:11:49 node15 ./opensips[18255]: INFO:core:shutdown_opensips: process 
2(18258) [MI Datagram] terminated, still waiting for 15 more
Jun 30 03:11:49 node15 ./opensips[18255]: INFO:core:shutdown_opensips: process 
1(18257) [JSON-RPC sender] terminated, still waiting for 14 more
Jun 30 03:11:49 node15 ./opensips[18255]: INFO:core:shutdown_opensips: process 
12(18273) [TCP receiver] terminated, still waiting for 13 more
Jun 30 03:11:49 node15 ./opensips[18255]: INFO:core:shutdown_opensips: process 
14(18278) [TCP receiver] terminated, still waiting for 12 more
Jun 30 03:11:49 node15 ./opensips[18255]: INFO:core:shutdown_opensips: process 
3(18259) [MI FIFO] terminated, still waiting for 11 more
Jun 30 03:11:49 node15 ./opensips[18255]: INFO:core:shutdown_opensips: process 
4(18260) [time_keeper] terminated, still waiting for 10 more
Jun 30 03:11:49 node15 ./opensips[18255]: INFO:core:shutdown_opensips: process 
13(18276) [TCP receiver] terminated, still waiting for 9 more
Jun 30 03:11:49 node15 ./opensips[18255]: INFO:core:shutdown_opensips: process 
5(18261) [timer] terminated, still waiting for 8 more
Jun 30 03:11:54 node15 ./opensips[18255]: INFO:core:cleanup: cleanup
Jun 30 03:11:56 node15 ./opensips[18255]: 
INFO:db_mysql:switch_state_to_disconnected: disconnect event for 0x60b00002fed0
Jun 30 03:11:56 node15 ./opensips[18255]: INFO:db_mysql:reset_all_statements: 
resetting all statements on connection: (0x60d000005b40) 0x60b00002fed0
Jun 30 03:11:56 node15 ./opensips[18255]: INFO:db_mysql:connect_with_retry: 
re-connected successful for 0x60b00002fed0
Jun 30 03:11:56 node15 ./opensips[18255]: 
INFO:db_mysql:db_mysql_do_prepared_query: reconnected to mysql server -> 
re-init the statement
Jun 30 03:11:56 node15 ./opensips[18255]: NOTICE:presence:destroy: destroy 
module ...
Jun 30 03:11:56 node15 ./opensips[18255]: NOTICE:event_jsonrpc:destroy: destroy 
module ...
Jun 30 03:11:56 node15 ./opensips[18255]: CRITICAL:core:fm_free_dbg: freeing 
already freed shm pointer (0x7ff8ed31b098), first free: net/net_tcp.c: 
_tcpconn_rm(747) - aborting!
---------------------

and backtrace:

(gdb) bt
#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
#1  0x00007ff90f1f2535 in __GI_abort () at abort.c:79
#2  0x00005584676ff62f in fm_free_dbg (fm=<optimized out>, p=<optimized out>, 
file=<optimized out>,
    func=<optimized out>, line=<optimized out>) at mem/f_malloc_dyn.h:221
#3  0x00005584679194ee in _shm_free (file=0x558467aae5a0 "net/net_tcp.c", 
line=747, function=<synthetic pointer>,
    ptr=0x7ff8ed31b098) at net/../mem/shm_mem.h:448
#4  _tcpconn_rm (c=0x7ff8ed31b098) at net/net_tcp.c:747
#5  0x000055846793bf12 in __tcpconn_lifetime (force=1) at net/net_tcp.c:1608
#6  tcp_destroy () at net/net_tcp.c:1811
#7  0x000055846764b51a in cleanup (show_status=show_status@entry=1) at 
main.c:360
#8  0x000055846764c9aa in shutdown_opensips (status=status@entry=134) at 
main.c:522
#9  0x000055846764dca0 in handle_sigs () at main.c:605
#10 0x000055846741c5fe in main_loop () at main.c:867
#11 main (argc=<optimized out>, argv=<optimized out>) at main.c:1480

--
Dan





_______________________________________________
Devel mailing list
[email protected]
http://lists.opensips.org/cgi-bin/mailman/listinfo/devel

Reply via email to