I've had problems in production where, after a while, some of the TCP listener 
opensips processes will get stuck and start consuming 100% CPU.  The only way 
to resolve it is to restart opensips.  This slowly starts to happen to each TCP 
listener until all are stuck and we stop serving requests.  Even while there 
are still a few functioning listeners, service is degraded and some clients to 
opensips times out.  I'm currently using the 1.11 branch at commit 
17c5990ee35691451008380f13b2ffcbc59ccf79.  I also observed this behavior on 
1.8, and was hoping upgrading to 1.11 would solve the issue.  Here's a typical 
stack trace while the process is stuck:

```
#0  0x00007f7d3596ee87 in sched_yield () from /lib64/libc.so.6
#1  0x00000000004cfa4b in get_lock (lock=0x7f7cb1e48390) at fastlock.h:182
#2  tcp_send (send_sock=0x7f7d334f5298, type=type@entry=3, 
    buf=buf@entry=0x7f7cba888c68 "CANCEL 
sip:[redacted]@[redacted];transport=TLS SIP/2.0\r\nVia: SIP/2.0/TLS 
[redacted]:5061;branch=z9hG4bK0d93.b6c7a621.0;i=dc65\r\nVia: SIP/2.0/TLS 
[redacted]:51370;r"..., len=len@entry=559, to=to@entry=0x7f7cb4d4ba50, 
id=id@entry=0)
    at tcp_main.c:1239
#3  0x00007f7d32419707 in msg_send (len=559, 
    buf=0x7f7cba888c68 "CANCEL sip:[redacted]@[redacted];transport=TLS 
SIP/2.0\r\nVia: SIP/2.0/TLS 
[redacted]:5061;branch=z9hG4bK0d93.b6c7a621.0;i=dc65\r\nVia: SIP/2.0/TLS 
[redacted]:51370;r"..., 
    id=0, to=0x7f7cb4d4ba50, proto=3, send_sock=<optimized out>)
    at ../../forward.h:128
#4  send_pr_buffer (rb=rb@entry=0x7f7cb4d4ba30, buf=0x7f7cba888c68, len=559)
    at t_funcs.c:69
#5  0x00007f7d3241eddf in t_forward_nonack (t=t@entry=0x7f7cb4d4b858, 
    p_msg=p_msg@entry=0x7f7d33cd8ed8, proxy=proxy@entry=0x0) at t_fwd.c:681
#6  0x00007f7d32419ccc in t_relay_to (p_msg=p_msg@entry=0x7f7d33cd8ed8, 
    proxy=proxy@entry=0x0, flags=flags@entry=0) at t_funcs.c:241
#7  0x00007f7d3243125d in w_t_relay (p_msg=0x7f7d33cd8ed8, 
    proxy=<optimized out>, flags=0x0) at tm.c:1206
#8  0x000000000041b071 in do_action (a=a@entry=0x7f7d3350b1b8, 
    msg=msg@entry=0x7f7d33cd8ed8) at action.c:1838
#9  0x0000000000420a39 in run_action_list (a=<optimized out>, 
    msg=0x7f7d33cd8ed8) at action.c:171
#10 0x000000000041df58 in do_action (a=a@entry=0x7f7d3350c3e0, 
    msg=msg@entry=0x7f7d33cd8ed8) at action.c:1105
#11 0x0000000000420a39 in run_action_list (a=<optimized out>, 
    msg=0x7f7d33cd8ed8) at action.c:171
#12 0x000000000041df58 in do_action (a=a@entry=0x7f7d3350c978, 
    msg=msg@entry=0x7f7d33cd8ed8) at action.c:1105
#13 0x0000000000420a39 in run_action_list (a=<optimized out>, 
    msg=0x7f7d33cd8ed8) at action.c:171
#14 0x000000000041efbd in do_action (a=a@entry=0x7f7d3350ca58, 
    msg=msg@entry=0x7f7d33cd8ed8) at action.c:1111
#15 0x0000000000420a39 in run_action_list (a=<optimized out>, 
    msg=0x7f7d33cd8ed8) at action.c:171
#16 0x000000000041efbd in do_action (a=a@entry=0x7f7d3350cb38, 
    msg=msg@entry=0x7f7d33cd8ed8) at action.c:1111
#17 0x0000000000420a39 in run_action_list (a=<optimized out>, 
    msg=0x7f7d33cd8ed8) at action.c:171
#18 0x000000000041efbd in do_action (a=a@entry=0x7f7d3350cc18, 
    msg=msg@entry=0x7f7d33cd8ed8) at action.c:1111
#19 0x0000000000420a39 in run_action_list (a=<optimized out>, 
    msg=0x7f7d33cd8ed8) at action.c:171
#20 0x000000000041efbd in do_action (a=a@entry=0x7f7d3350ccf8, 
    msg=msg@entry=0x7f7d33cd8ed8) at action.c:1111
#21 0x0000000000420c24 in run_action_list (msg=<optimized out>, 
    a=<optimized out>) at action.c:171
#22 run_actions (msg=0x7f7d33cd8ed8, a=<optimized out>) at action.c:136
#23 run_top_route (a=<optimized out>, msg=msg@entry=0x7f7d33cd8ed8)
    at action.c:211
#24 0x0000000000489143 in receive_msg (
    buf=buf@entry=0x876b08 <current_req+8> "CANCEL 
sip:[redacted]@[redacted];transport=TLS SIP/2.0\r\nVia: SIP/2.0/TLS 
[redacted]:51370;rport;branch=z9hG4bKPj4rfzZOxA8KjWkF2jX1wz5YLxkLNUxPsK\r\nMax-Forwards:
 69\r\nF"..., len=len@entry=423, rcv_info=rcv_info@entry=0x7fffc506a590)
    at receive.c:172
#25 0x00000000004d949f in tcp_read_req (con=con@entry=0x7f7cb2ff3bf8, 
    bytes_read=bytes_read@entry=0x7fffc506a670) at tcp_read.c:697
#26 0x00000000004da4e6 in handle_io (fm=0x7f7d3351f038, idx=idx@entry=-1, 
    event_type=1) at tcp_read.c:1033
#27 0x00000000004dc674 in io_wait_loop_epoll (h=<optimized out>, 
    t=<optimized out>, repeat=<optimized out>) at io_wait.h:845
#28 tcp_receive_loop (unix_sock=<optimized out>) at tcp_read.c:1141
#29 0x00000000004d4de4 in tcp_init_children (
    chd_rank=chd_rank@entry=0x836a50 <chd_rank.22745>, 
    startup_done=startup_done@entry=0x0) at tcp_main.c:2373
#30 0x0000000000416e24 in main_loop () at main.c:1024
#31 main (argc=<optimized out>, argv=<optimized out>) at main.c:1634
```
We're using TLS here, but the same thing happens when using unencrypted TCP.

Unfortunately, I've been unable to reproduce this under controlled conditions; 
it just seems to happen randomly in production.

---
Reply to this email directly or view it on GitHub:
https://github.com/OpenSIPS/opensips/issues/410
_______________________________________________
Devel mailing list
[email protected]
http://lists.opensips.org/cgi-bin/mailman/listinfo/devel

Reply via email to