I've done more research on the issues you had with your very heavily loaded server, and here's a summary of some of the configurable parameters that affect how many connections a server can handle at one time.


Part I: The Operating System listen backlog limits

Network applications use the listen() system call to notify the operating system that they want to receive connections on a specific port. The operating system's TCP stack receives new connections and holds them until the application uses the accept() system call which ties the new connection to the userspace application. The TCP stack can hold a limited number of new connections that have not yet been accept()ed by the application -- this is called the "listen backlog". Eons ago, the default maximum listen backlog for most Unices was 5; when there were 5 new connections in the listen queue that had not been accept()ed by the application, all new connections coming in were dropped until the backlog dropped below 5. Most operating systems have higher defaults as specified by the SOMAXCONN defined in the /usr/include/socket.h file: the Linux 2.6 kernel and Mac OS 10.4.x both set SOMAXCONN to 128. If your server is receiving new connections at a rate faster than your application can accept() them, and the listen backlog builds to 128, new connections after that will be dropped until the backlog is reduced to below 128.

This doesn't mean you're stuck with this value, this is just what the operating system will default to at boot time. You can change this value on Solaris with:

/usr/sbin/ndd -set /dev/tcp tcp_conn_req_max_q 1024

and on Linux with:

/sbin/sysctl -w net.ipv4.tcp_max_syn_backlog=1024

In fact, my Gentoo system modifies this parameter at boot time to be 1024. 

My guess is that your operating system has max listen backlog set to 1024 or higher, but I don't have a Solaris system to check or test this (yet).


Part II: AOLserver listen backlog limits

You can change the max listen backlog with the listen() call itself on a per-application basis: the second argument to listen() is an integer value of what you want the backlog to be set to for that application. If you set the backlog to be greater than the operating system setting, you'll get the operating system's listen backlog value, not what you requested, but you won't know that because normally your setting is silently truncated to match the lower operating system value.

If you set the listen backlog to less than the operating system's max listen backlog, you'll get what you asked for. This is a good way to prevent a single application on a system where there are several network server applications running from hogging the listen backlog.

AOLserver 3.5.6 limits the listen backlog to 32 new connections via the BACKLOG define in nsconf.h. If your server is getting new connections at a rate faster than AOLserver can accept them, and you reach the limit of 32 in the listen queue, connections will be dropped until the backlog drops below 32.

You can change AOLserver's listen backlog by creating the "listenbacklog" param in your nsd.tcl file and setting it to an integer value you would like. I'm guessing that this param should be set in the server section, but I haven't validated this. You could also change it by changing the BACKLOG define and recompiling.


Part III: Thread and select()/poll() interactions

The bad news is that this may not be what you saw on your server. I say that because if connections are being dropped by the operating system before they are accepted by the application, then the application would never even see them or know the connection attempts had been made, and so could not log that the connection had been dropped. But, according to the notes in the accept() man page on Linux: "There may not always be a connection waiting after a SIGIO is delivered or select(2) or poll(2) return a readability event because the  connection might have been removed by an asynchronous network error or another thread before accept() is called." It's quite possible that AOLserver was getting readability events but when it went to process them, the connections were gone -- whether that means they were already handled by another thread or dropped I do not know.

Conclusion:

I'm not yet a guru when it comes to the TCP stack and its interaction with userspace applications and threads, so don't bet the ranch on my analysis. It's also possible that there's some issue in OpenSSL or nsopenssl that is causing this problem. The fact that nsopenssl is calling Ns_QueueConn and failing on the result tells me that the problem is occuring in AOLserver's connection management or in the operating system limitations. This may not be the result of a bug or bugs, it may just be what happens when you load the system beyond it's performance boundaries -- systems tend to become non-deterministic when pushed too far. I'd have to overload a server and watch its behavior many times with different settings and instrumentation to see what might actually be happening.

If you have more information on the problems you encountered, I'll do my best to help within the time I have available.

/s.




On Jul 22, 2006, at 8:59 PM, Scott Goodwin wrote:

AOLserver actually manages the connections for nsopenssl. The nsopenssl code in question is:

if (Ns_QueueConn(sdPtr->driver, scPtr) != NS_OK) {
Ns_Log(Warning, "%s: connection dropped", sdPtr->module);
(void) SockClose(scPtr);
}

nsopenssl is getting something other than NS_OK back from Ns_QueueConn when the latter tries to append the connection to the run queue. AOLserver in turn may not have been keeping up with the load, but I'd first check your OS TCP pending connection limits. If your system was being hammered, it's possible your OS was turning away conns. I'm not sure I should have put this message in the log as it may not reflect what actually happened. Unless someone responds with a better answer, I'll take a closer look at the code tomorrow.

/s.

On Jul 22, 2006, at 7:45 PM, William Scott Jordan wrote:

Hi all!

We had a situation recently of extremely high traffic, during which connections were being rejected/dropped with following warning showing up in the logs: "Warning: nsopenssl: connection dropped"  I guess my questions are, what "limit" in nsopenssl is causing connections to be dropped?  Can this limit be adjusted?  Is there any way to catch this error to allow for a more graceful degredation, such as with a redirect to an unencrypted "Server Too Busy" page?

This is on AOLServer 3.5.6, nsopenssl 2.1a, and FC3.

Any light that anyone can shed on this would be greatly appreciated.

-Scott


--

To Remove yourself from this list, simply send an email to <[EMAIL PROTECTED]> with the
body of "SIGNOFF AOLSERVER" in the email message. You can leave the Subject: field of your email blank.


--

To Remove yourself from this list, simply send an email to <[EMAIL PROTECTED]> with the
body of "SIGNOFF AOLSERVER" in the email message. You can leave the Subject: field of your email blank.

-- AOLserver - http://www.aolserver.com/

To Remove yourself from this list, simply send an email to <[EMAIL PROTECTED]> with the body of "SIGNOFF AOLSERVER" in the email message. You can leave the Subject: field of your email blank.

Reply via email to