Re: BIO performance issues

Rainer Jung Wed, 04 May 2011 07:21:06 -0700

I hope the following is not too long and confusing ...


On 03.05.2011 22:02, Mark Thomas wrote:

Scenario
--------
This ended up being very long, so I moved it to the end. The exact
pattern of delays will vary depending on timeouts, request frequency
etc. but the scenario shows an example of how delays can occur. The
short version is that requests with data to process (particularly new
connections) tend to get delayed in the queue waiting for a thread to
process them when the threads are all tied up processing keep-alive
connections.

Root cause
----------
The underlying cause of all of the performance issues observed is when
the threads are tied up doing HTTP keep-alive when there is no data
process but there are other connections in the queue that do have data
that could be processed.

Solution A
----------
NIO is designed to handle this using a poller. That isn't available to
BIO so I attempted to simulate it. That generated excessive CPU load so
I do not think simulated polling is the tight solution.

I expect generating the SocketTimeoutException is expensive, because theJVM has to generate the stack information. The rate of the Exceptionwhen handling mostly keep-alive (extreme case) is your "poll" timeouttimes the number of threads, e.g. 100ms timeout times 200 threads is2000 exceptions per second. Even if there is another reason for the highCPU load, I expect it to be roughly proportional to the poll rate. In asaturated system with lots of keep-alive you will have:


pollRate = 1 / pollTimeout * maxThreads
(e.g. 1 / 0.1s * 200 = 2000/s)
averageWaitBeforePoll = maxConnection / pollRate / 2
(e.g. 10000 / 2000 / 2 / s = 1.5s)

So we see, that in your case though we already have a high poll eventrate, we end up with every connection only being polled every 2.5seconds, which is too much of request latency. If we want to reduce thislatency, we would need to increse the rate. But then CPU gets evenworse. Or we need to reduce maxConnections.


Let us try a different sizing:

maxThreads 200 , maxConnections 1000 (less overcommitment, but stillvery different from 200), pollTimeout 200ms.


rate = 1000, half of the previous rate due to the doubled timeout.
averageWaitBeforePoll = 0.5 seconds.

Although this is an improvement, we still have a high poll rate and even0.5 seconds average wait time for new connections isn't nice.

The tradeoff is: To be CPU effective, we have to reduce the poll rate.Assuming a fixed thread and connection count, this automatically leadsto longer averageWaitBeforePoll, i.e. request latency. There seems to beno sweet spot for sizing the system.

If we do not find an efficient way (in terms of CPU and blocking time ofthreads) to handle the keep-alive connections, then I don't expect asolution to the problem - except for disabling keep-alive or notaccepting much more connections than we have threads. At the end that'sthe 75% threads busy then disable keep-alive solution. One could throwin some "reduce keep-alive timeout under load" feature, but I doubt itwill help much more than the simply solution.

Do we see a cpu time and thread blocking time efficient way of handlingmany keep-alive connections? I don't see any API, that would help here.Of course one could try to build a hyprid "blocking for normalprocessing but non-blocking for keep-alive" thing, but since we alreadyhave NIO I would also support recommending NIO for keep-alive.

Switching the default from BIO to NIO is a big change, but only after weswitch will we find the last buglets and problems arising under rareconditions. So if we want to switch, we should do it very soon. Doing itlate in the TC 7 cycle would be bad.

Lastly: APR uses server to server connections, as does HTTP when using areverse proxy in front of Tomcat. In those cases we have much fewerconnections with a higher rate of requests per connection. TheremaxThreads == maxConnections is fine (and even the 75% rule could beswitched off). So for this scenario it would be nice to not drop BIO, atleast until the major TC version after the default switched to NIO.

Solution B
----------
Return to the Tomcat 6 implementation where maxConnections == maxThreads.

Additional clean-up
-------------------
maxConnections is unnecessary in APR since pollerSize performs the same
function.

Summary
-------
The proposed changes are:
a) restore disabling keep-alive when threads used>= 75% of maxThreads
b) remove maxConnections and associated code from the APR connector
c) remove the configuration options for maxConnections from the BIO
connector
d) use maxThreads instead of maxConnections for the BIO connector
e) update the docs

I agree (especially after your additional clarifications in reply toKonstantin).


Rainer

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@tomcat.apache.org
For additional commands, e-mail: dev-h...@tomcat.apache.org

Re: BIO performance issues

Reply via email to