On Mon, 12 Apr 2021, 09:07 Mark Thomas, <ma...@apache.org> wrote:

> On 11/04/2021 11:03, Peter Chamberlain wrote:
>
> <snip/>
>
> > I've been investigating this some more, as I'm not convinced nio2 isn't
> > behaving strangely in this case. I think there may of been some sort of
> > reversion as it is much less likely to refuse connections for nio2 in
> > tomcat 9.0.13 when compared to 9.0.14. I'm wondering if it has something
> to
> > do with:
> >
> >           Avoid using a dedicated thread for accept on the NIO2
> connector,
> > it is always less efficient. (remm)
> >
> > And if it is hitting some sort of accept thread starvation case when it
> is
> > fully loaded. In tomcat 9.0.13 I can hit a maxTheads=200 nio2 connector
> > with 5000 jmeter threads and not experience a connection refused, but in
> > 9.0.14 I can't reach 1000 without refused connections. It doesn't seem to
> > be related to forwards or redirects either. If I just sleep for 1500
> > milliseconds for every servlet run and not redirect or forward and it
> > behaves the same.
> > We've been using nio2 in our tomcats exclusively for some time, as we hit
> > an issue with nio in the past (can't remember what it was, it is likely
> > fixed by now I would think), so I guess we're more likely to notice this
> > sort of thing.
>
> I think you are asking the wrong question(s). 200 threads with a 1500ms
> wait means I would expect Tomcat to be processing ~133 requests per
> second. (Assuming you have at least 200 client threads as well). Higher
> numbers of client threads, the timeouts configured on the client, the
> timeouts configured on Tomcat, the accept count etc shouldn't change the
> requests per second results. What will change is the failure scenarios
> you observe - and I think that is what you are seeing here between
> 9.0.13 and 9.0.14. 9.0.13 might be accepting more connections but that
> doesn't mean those connections are being processed faster. Depending on
> timeouts, they might (eventually) get processed or they might timeout.
>
> You might want to try the following:
> - Limit the number of loops to, say, 10 so you get 50,000 requests. Look
> at the response time stats. What is the average? What is the min/max?
> - Repeat the test. Do the results remain consistent?
> - Repeat the test with more loops. Do the results remain consistent?
> - Repeat the test with fewer client threads. At what point do you start
> to get consistent results?
>
> It may well be that changes to Tomcat over time have changed the way
> Tomcat behaves under various (overloaded network) failure scenarios.
>
> My reading of the change that you reference above does mean that Tomcat
> will only accept a new connection over NIO2 when it has a processing
> thread available to process it. That will change the way Tomcat behaves
> when presented with a large spike of new connections. (Significantly)
> increasing the acceptCount (a.k.a. backlog) to more than the number
> connections expected in a single "spike" in 9.0.14 should give 9.0.13
> like behaviour.
>
> HTH,
>
> Mark
>

I understand what you are saying. I'm only actually hitting it with 1000
requests total, and approx 300 are failing with connection refused. This
isn't jus the first run either, so it isn't a jvm warm up issue. I'm
overloading the number of threads (200). But it doesn't really handle that
overloading in the way that might be expected (just delaying processing,
its failing some inside 7 seconds,  even with high accept count, max
connections, and connection timeouts). Essentially we're looking at cases
where we are overloaded for short periods, and trying to cope with that
without a bad customer response. This is for a link server of sorts, so the
result at present is people clicking links get failures, rather than
delays. Obviously we can increase number of threads to mitigate this to
some degree (although that increases resources used),  we're looking at
improving the performance too, and we can spread the load over more servers
if necessary. I'm still concerned this is likely to happen for this
application, so have recommended we switch back to nio instead, as it seems
to cope better with it. There is a difficult balance here with sufficient
performance against coping with ddos attempts, so I understand its not
really a simple area. Just thought you should know that 9.0.14 made it much
worse compared to 9.0.13, in case this query comes up again.
Obviously waiting for a large period of time for link clicks to work is
also undesirable, we are really just looking at worse case scenarios here.

Best regards, Peter.


> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
> For additional commands, e-mail: users-h...@tomcat.apache.org
>

Reply via email to