Bugs item #3585606, was opened at 2012-11-08 20:47 Message generated for change (Tracker Item Submitted) made by dmsanders You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=1086410&aid=3585606&group_id=232389
Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: core Group: 1.8.x Status: Open Resolution: None Priority: 5 Private: No Submitted By: David Sanders (dmsanders) Assigned to: Nobody/Anonymous (nobody) Summary: TCP Deadlock Initial Comment: There is a serious deadlock issue when using TCP with OpenSIPS (1.8.0-tls). I found this paper which has the same conclusion (but is discussing OpenSER circa 2008): http://www.cs.rice.edu/CS/Architecture/docs/ram-ispass08.pdf I'll quote the relevant part of Section 6: This can lead to deadlock in the following situation. When a worker process requests a connection from the supervisor process, it then blocks waiting to receive that file descriptor. If, at the same time, the supervisor process blocks waiting to send a new connection to the same worker (since the buffer at the receiver is full), the two processes will deadlock. Once the supervisor process deadlocks, no other worker can make progress either, as they will quickly need their own connections from the supervisor process. Similarly, no new connections will be accepted. This clearly illustrates that in an event-driven server, one must be careful to only read from sockets when the event mechanism says there is something to read and only write to sockets when the event mechanism says there is space to write. I can reliably reproduce this deadlock with any number of TCP children. Interestingly it seems to happen faster with a larger number of children. Under constant load, once the main TCP process deadlocks, all the children will as well. It seems to be rate related. Using SIPp to drive TCP traffic to an OpenSIPS server, 50 registers/second do not encounter the deadlock issue. However, if increase the traffic load a deadlock will occur within 30 seconds. My theory is that if the TCP children can't process a message and reply faster than they are coming in (in this case faster than 20ms) then the deadlock will occur. For completeness the GDB backtrace output of the deadlocked processes when running two TCP children are attached. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=1086410&aid=3585606&group_id=232389 _______________________________________________ Devel mailing list Devel@lists.opensips.org http://lists.opensips.org/cgi-bin/mailman/listinfo/devel