Very interesting stuff, be sure to put it on the Wiki for prosperity!
On Monday, July 24, 2006 4:35, Scott Goodwin said:
> I've done more research on the issues you had with your very heavily
> loaded server, and here's a summary of some of the configurable
> parameters that affect how many connections a server can handle at
> one time.
>
>
> Part I: The Operating System listen backlog limits
>
> Network applications use the listen() system call to notify the
> operating system that they want to receive connections on a specific
> port. The operating system's TCP stack receives new connections and
> holds them until the application uses the accept() system call which
> ties the new connection to the userspace application. The TCP stack
> can hold a limited number of new connections that have not yet been
> accept()ed by the application -- this is called the "listen backlog".
> Eons ago, the default maximum listen backlog for most Unices was 5;
> when there were 5 new connections in the listen queue that had not
> been accept()ed by the application, all new connections coming in
> were dropped until the backlog dropped below 5. Most operating
> systems have higher defaults as specified by the SOMAXCONN defined in
> the /usr/include/socket.h file: the Linux 2.6 kernel and Mac OS
> 10.4.x both set SOMAXCONN to 128. If your server is receiving new
> connections at a rate faster than your application can accept() them,
> and the listen backlog builds to 128, new connections after that will
> be dropped until the backlog is reduced to below 128.
>
> This doesn't mean you're stuck with this value, this is just what the
> operating system will default to at boot time. You can change this
> value on Solaris with:
>
> /usr/sbin/ndd -set /dev/tcp tcp_conn_req_max_q 1024
>
> and on Linux with:
>
> /sbin/sysctl -w net.ipv4.tcp_max_syn_backlog=1024
>
> In fact, my Gentoo system modifies this parameter at boot time to be
> 1024.
>
> My guess is that your operating system has max listen backlog set to
> 1024 or higher, but I don't have a Solaris system to check or test
> this (yet).
>
>
> Part II: AOLserver listen backlog limits
>
> You can change the max listen backlog with the listen() call itself
> on a per-application basis: the second argument to listen() is an
> integer value of what you want the backlog to be set to for that
> application. If you set the backlog to be greater than the operating
> system setting, you'll get the operating system's listen backlog
> value, not what you requested, but you won't know that because
> normally your setting is silently truncated to match the lower
> operating system value.
>
> If you set the listen backlog to less than the operating system's max
> listen backlog, you'll get what you asked for. This is a good way to
> prevent a single application on a system where there are several
> network server applications running from hogging the listen backlog.
>
> AOLserver 3.5.6 limits the listen backlog to 32 new connections via
> the BACKLOG define in nsconf.h. If your server is getting new
> connections at a rate faster than AOLserver can accept them, and you
> reach the limit of 32 in the listen queue, connections will be
> dropped until the backlog drops below 32.
>
> You can change AOLserver's listen backlog by creating the
> "listenbacklog" param in your nsd.tcl file and setting it to an
> integer value you would like. I'm guessing that this param should be
> set in the server section, but I haven't validated this. You could
> also change it by changing the BACKLOG define and recompiling.
>
>
> Part III: Thread and select()/poll() interactions
>
> The bad news is that this may not be what you saw on your server. I
> say that because if connections are being dropped by the operating
> system before they are accepted by the application, then the
> application would never even see them or know the connection attempts
> had been made, and so could not log that the connection had been
> dropped. But, according to the notes in the accept() man page on
> Linux: "There may not always be a connection waiting after a SIGIO is
> delivered or select(2) or poll(2) return a readability event because
> the connection might have been removed by an asynchronous network
> error or another thread before accept() is called." It's quite
> possible that AOLserver was getting readability events but when it
> went to process them, the connections were gone -- whether that means
> they were already handled by another thread or dropped I do not know.
>
> Conclusion:
>
> I'm not yet a guru when it comes to the TCP stack and its interaction
> with userspace applications and threads, so don't bet the ranch on my
> analysis. It's also possible that there's some issue in OpenSSL or
> nsopenssl that is causing this problem. The fact that nsopenssl is
> calling Ns_QueueConn and failing on the result tells me that the
> problem is occuring in AOLserver's connection management or in the
> operating system limitations. This may not be the result of a bug or
> bugs, it may just be what happens when you load the system beyond
> it's performance boundaries -- systems tend to become non-
> deterministic when pushed too far. I'd have to overload a server and
> watch its behavior many times with different settings and
> instrumentation to see what might actually be happening.
>
> If you have more information on the problems you encountered, I'll do
> my best to help within the time I have available.
>
> /s.
>
>
>
>
> On Jul 22, 2006, at 8:59 PM, Scott Goodwin wrote:
>
> > AOLserver actually manages the connections for nsopenssl. The
> > nsopenssl code in question is:
> >
> > if (Ns_QueueConn(sdPtr->driver, scPtr) != NS_OK) {
> > Ns_Log(Warning, "%s: connection dropped", sdPtr->module);
> > (void) SockClose(scPtr);
> > }
> >
> > nsopenssl is getting something other than NS_OK back from
> > Ns_QueueConn when the latter tries to append the connection to the
> > run queue. AOLserver in turn may not have been keeping up with the
> > load, but I'd first check your OS TCP pending connection limits. If
> > your system was being hammered, it's possible your OS was turning
> > away conns. I'm not sure I should have put this message in the log
> > as it may not reflect what actually happened. Unless someone
> > responds with a better answer, I'll take a closer look at the code
> > tomorrow.
> >
> > /s.
> >
> > On Jul 22, 2006, at 7:45 PM, William Scott Jordan wrote:
> >
> >> Hi all!
> >>
> >> We had a situation recently of extremely high traffic, during
> >> which connections were being rejected/dropped with following
> >> warning showing up in the logs: "Warning: nsopenssl: connection
> >> dropped" I guess my questions are, what "limit" in nsopenssl is
> >> causing connections to be dropped? Can this limit be adjusted?
> >> Is there any way to catch this error to allow for a more graceful
> >> degredation, such as with a redirect to an unencrypted "Server Too
> >> Busy" page?
> >>
> >> This is on AOLServer 3.5.6, nsopenssl 2.1a, and FC3.
> >>
> >> Any light that anyone can shed on this would be greatly appreciated.
> >>
> >> -Scott
> >>
> >>
> >> --
> >> AOLserver - http://www.aolserver.com/
> >>
> >> To Remove yourself from this list, simply send an email to
> >> <[EMAIL PROTECTED]> with the
> >> body of "SIGNOFF AOLSERVER" in the email message. You can leave
> >> the Subject: field of your email blank.
> >
> >
> > --
> > AOLserver - http://www.aolserver.com/
> >
> > To Remove yourself from this list, simply send an email to
> > <[EMAIL PROTECTED]> with the
> > body of "SIGNOFF AOLSERVER" in the email message. You can leave the
> > Subject: field of your email blank.
>
>
>
> --
> AOLserver - http://www.aolserver.com/
>
> To Remove yourself from this list, simply send an email to
> <[EMAIL PROTECTED]> with the
> body of "SIGNOFF AOLSERVER" in the email message. You can leave the
> Subject: field of your email blank.
>
--
AOLserver - http://www.aolserver.com/
To Remove yourself from this list, simply send an email to <[EMAIL PROTECTED]>
with the
body of "SIGNOFF AOLSERVER" in the email message. You can leave the Subject:
field of your email blank.