Hi Scott,
This is fantastic information and definitely gives me some direction
in my effort to keep things stable under heavy load. I really
appreciate your help on this.
On a somewhat related topic... I'm sure that at least a few of you
are running some high-traffic sites on AOLserver. I've been told
that I need to prep our system to handle bursts of 100+ page views
per second by the end of the year. What kind of setups are other
people using to handle this kind of traffic? We will be outsourcing
our DB management, so I'm not as concerned about that side of
things. I'm mostly interested in knowing what kind of hardware
configurations people are using on the webserver side of the equation
(load balancers, web servers, etc.) and if there are any special
AOLserver configuration tweaks that would help with these kinds of loads.
-Scott
At 08:35 PM 7/23/2006, you wrote:
I've done more research on the issues you had with your very heavily
loaded server, and here's a summary of some of the configurable
parameters that affect how many connections a server can handle at one time.
Part I: The Operating System listen backlog limits
Network applications use the listen() system call to notify the
operating system that they want to receive connections on a specific
port. The operating system's TCP stack receives new connections and
holds them until the application uses the accept() system call which
ties the new connection to the userspace application. The TCP stack
can hold a limited number of new connections that have not yet been
accept()ed by the application -- this is called the "listen
backlog". Eons ago, the default maximum listen backlog for most
Unices was 5; when there were 5 new connections in the listen queue
that had not been accept()ed by the application, all new connections
coming in were dropped until the backlog dropped below 5. Most
operating systems have higher defaults as specified by the SOMAXCONN
defined in the /usr/include/socket.h file: the Linux 2.6 kernel and
Mac OS 10.4.x both set SOMAXCONN to 128. If your server is receiving
new connections at a rate faster than your application can accept()
them, and the listen backlog builds to 128, new connections after
that will be dropped until the backlog is reduced to below 128.
This doesn't mean you're stuck with this value, this is just what
the operating system will default to at boot time. You can change
this value on Solaris with:
/usr/sbin/ndd -set /dev/tcp tcp_conn_req_max_q 1024
and on Linux with:
/sbin/sysctl -w net.ipv4.tcp_max_syn_backlog=1024
In fact, my Gentoo system modifies this parameter at boot time to be 1024.
My guess is that your operating system has max listen backlog set to
1024 or higher, but I don't have a Solaris system to check or test this (yet).
Part II: AOLserver listen backlog limits
You can change the max listen backlog with the listen() call itself
on a per-application basis: the second argument to listen() is an
integer value of what you want the backlog to be set to for that
application. If you set the backlog to be greater than the operating
system setting, you'll get the operating system's listen backlog
value, not what you requested, but you won't know that because
normally your setting is silently truncated to match the lower
operating system value.
If you set the listen backlog to less than the operating system's
max listen backlog, you'll get what you asked for. This is a good
way to prevent a single application on a system where there are
several network server applications running from hogging the listen backlog.
AOLserver 3.5.6 limits the listen backlog to 32 new connections via
the BACKLOG define in nsconf.h. If your server is getting new
connections at a rate faster than AOLserver can accept them, and you
reach the limit of 32 in the listen queue, connections will be
dropped until the backlog drops below 32.
You can change AOLserver's listen backlog by creating the
"listenbacklog" param in your nsd.tcl file and setting it to an
integer value you would like. I'm guessing that this param should be
set in the server section, but I haven't validated this. You could
also change it by changing the BACKLOG define and recompiling.
Part III: Thread and select()/poll() interactions
The bad news is that this may not be what you saw on your server. I
say that because if connections are being dropped by the operating
system before they are accepted by the application, then the
application would never even see them or know the connection
attempts had been made, and so could not log that the connection had
been dropped. But, according to the notes in the accept() man page
on Linux: "There may not always be a connection waiting after a
SIGIO is delivered or select(2) or poll(2) return a readability
event because the connection might have been removed by an
asynchronous network error or another thread before accept() is
called." It's quite possible that AOLserver was getting readability
events but when it went to process them, the connections were gone
-- whether that means they were already handled by another thread or
dropped I do not know.
Conclusion:
I'm not yet a guru when it comes to the TCP stack and its
interaction with userspace applications and threads, so don't bet
the ranch on my analysis. It's also possible that there's some issue
in OpenSSL or nsopenssl that is causing this problem. The fact that
nsopenssl is calling Ns_QueueConn and failing on the result tells me
that the problem is occuring in AOLserver's connection management or
in the operating system limitations. This may not be the result of a
bug or bugs, it may just be what happens when you load the system
beyond it's performance boundaries -- systems tend to become
non-deterministic when pushed too far. I'd have to overload a server
and watch its behavior many times with different settings and
instrumentation to see what might actually be happening.
If you have more information on the problems you encountered, I'll
do my best to help within the time I have available.
/s.
On Jul 22, 2006, at 8:59 PM, Scott Goodwin wrote:
AOLserver actually manages the connections for nsopenssl. The
nsopenssl code in question is:
if (Ns_QueueConn(sdPtr->driver, scPtr) != NS_OK) {
Ns_Log(Warning, "%s: connection dropped", sdPtr->module);
(void) SockClose(scPtr);
}
nsopenssl is getting something other than NS_OK back from
Ns_QueueConn when the latter tries to append the connection to the
run queue. AOLserver in turn may not have been keeping up with the
load, but I'd first check your OS TCP pending connection limits. If
your system was being hammered, it's possible your OS was turning
away conns. I'm not sure I should have put this message in the log
as it may not reflect what actually happened. Unless someone
responds with a better answer, I'll take a closer look at the code tomorrow.
/s.
On Jul 22, 2006, at 7:45 PM, William Scott Jordan wrote:
Hi all!
We had a situation recently of extremely high traffic, during
which connections were being rejected/dropped with following
warning showing up in the logs: "Warning: nsopenssl: connection
dropped" I guess my questions are, what "limit" in nsopenssl is
causing connections to be dropped? Can this limit be
adjusted? Is there any way to catch this error to allow for a
more graceful degredation, such as with a redirect to an
unencrypted "Server Too Busy" page?
This is on AOLServer 3.5.6, nsopenssl 2.1a, and FC3.
Any light that anyone can shed on this would be greatly appreciated.
-Scott
--
AOLserver - <http://www.aolserver.com>http://www.aolserver.com/
To Remove yourself from this list, simply send an email to
<<mailto:[EMAIL PROTECTED]>[EMAIL PROTECTED]> with the
body of "SIGNOFF AOLSERVER" in the email message. You can leave
the Subject: field of your email blank.
--
AOLserver - <http://www.aolserver.com>http://www.aolserver.com/
To Remove yourself from this list, simply send an email to
<<mailto:[EMAIL PROTECTED]>[EMAIL PROTECTED]> with the
body of "SIGNOFF AOLSERVER" in the email message. You can leave the
Subject: field of your email blank.
--
AOLserver - http://www.aolserver.com/
To Remove yourself from this list, simply send an email to
<[EMAIL PROTECTED]> with the
body of "SIGNOFF AOLSERVER" in the email message. You can leave the
Subject: field of your email blank.
--
AOLserver - http://www.aolserver.com/
To Remove yourself from this list, simply send an email to <[EMAIL PROTECTED]>
with the
body of "SIGNOFF AOLSERVER" in the email message. You can leave the Subject:
field of your email blank.