Rainer Jung wrote:
Am 22.04.2015 um 11:58 schrieb Thomas Boniface:
What concerns me the most is the CLOSE_WAIT on tomcat side because when an fd peak appears the web application appears to be stuck. It feels like all
its connections are consumed and none can be established from nginx
anymore. Shouldn't the CLOSE_WAIT connection be recycled to received new
connections from nginx ?

Just to clarify:

Every connection has two ends. In netstat the "local" end is left, the "remote" end is right. If a connection is between processes both on the same system, it will be shown in netstat twice. Once for each endpoint being the "local" side.

CLOSE_WAIT for a connection between a (local) and b (remote) means, that b has closed the connection but not a. There is no automatism for a closing it because b has closed it. If CLOSE_WAIT pile up, then the idea of b and a when a connection should no longer be used are disparate. E.g. they might have very different idle timeouts (Keep Alive Timeout on HTTP speak), or one observed a problem that the other didn't observe.

When I did the counting for

  Count           IP:Port ConnectionState
   8381    127.0.0.1:8080 CLOSE_WAIT

the "127.0.0.1:8080" was left in netstat output, so "local". It means the other side (whatever is the other side of the connection, likely nginx) has closed the connection alardy, but not Tomcat.

And the total number of those connections:

  Count           IP:Port ConnectionState
   8381    127.0.0.1:8080 CLOSE_WAIT
   1650    127.0.0.1:8080 ESTABLISHED

indeed sums up to the default maxConnections 10000 mentioned by Chris.

What I do not understand is, that the same connections looked at from nginx being the local end, show a totally different statistics:

  Count           IP:Port ConnectionState
  20119    127.0.0.1:8080 SYN_SENT
   4692    127.0.0.1:8080 ESTABLISHED
    488    127.0.0.1:8080 FIN_WAIT2
    122    127.0.0.1:8080 TIME_WAIT
     13    127.0.0.1:8080 FIN_WAIT1

But maybe that's a problem to solve after you fixed the CLOSED_WAIT (or the 1000 limit) and redo the whole observation.

Pretty big numbers you habe ...


Thomas,
to elaborate on what Rainer is writing above :

A TCP connection consists of 2 "pipes", one in each direction (client to server, server to client). From a TCP point of view, the "client" is the one which initially requests the connection. The "server" is the one which "accepts" that connection. (This is different from the more general idea of "server", as in "Tomcat server". When Tomcat accepts a HTTP connection, it acts as "server"; when a Tomcat webapp establishes a connection with an external HTTP server, the webapp (and by extension Tomcat) is the "client").

These 2 pipes can be closed independently of one another, but both need to be closed for the connection to be considered as closed and able to "disappear". When the client wants to close the connection, it will send a "close request" packet on the client-to-server pipe. The server receives this, and knows then that the client will not send anything anymore onto that pipe. For a server application reading that pipe, this would result in the equivalent of an "end of file" on that datastream. In response to the client close request, the server is supposed to react by not sending any more data onto the server-to-client pipe, and in turn to send a "close request" onto that pipe. Once these various close messages have been received and acknowledged by both sides of the connection, the connection is considered as closed, and the resources associated with it can be reclaimed/recycled/garbage collected etc.. ("closed" is like a virtual state; it means that there is no connection).

But if one side fails to fulfill its part of that contract, the connection is still there, and it just remains there forever until something forceful terminates it. And all the resources tied to that connection also remain tied to it, and are subtracted from the overall resources which the server has available to perform other tasks. From a server point of view, the "ideal" situation is when all connections are actually "active" and really being used to do something useful (sending or receiving data e.g.). The worst situation is when there are many "useless" connections : connections in some state or the other, not actually doing anything useful, but tying up resources nevertheless. This can get to the point where some inherent limit is reached, and the server cannot accept any more connections, although in theory it still has enough other resources available which would allow it to process more useful transactions.

Most of the "TCP states" that you see in the netstat output are transient, and last only a few milliseconds usually. They are just part of the overall "TCP connection lifecycle" which is cast in stone and which you can do nothing about. But, for example, if there is a permanent very high number of connections in the CLOSE_WAIT state, that is not "normal".

See here for an explanation of these TCP states, in particular CLOSE_WAIT :
http://www.tcpipguide.com/free/t_TCPOperationalOverviewandtheTCPFiniteStateMachineF-2.htm

According to Rainer's counts above, you have 1650 connections in the ESTABLISHED state (and for the time being, let's suppose that these are actually busy doing something useful). But you also have 8381 connections in the CLOSE_WAIT state. These are not doing anything useful, but they are blocking resources on your server. One essential resource which they are blocking, is that there is (currently) a maximum *total* of 10,000 connections which can be in existence at any one time, and these CLOSE_WAIT connections are occupying (uselessly) 8381 of these "slots" (84%).

The precise reason why there are this many connections in that state is not clear to us, but my money is on either some misconfiguration of the nginx-tomcat connections, or some flaw in the application.

One thing which you could try, and which might provide a clue, is to, in quick succession, do :
1) a "netstat" command to see how many connections are in CLOSE_WAIT state
2) /force/ a GC for Tomcat (*).
3) the same netstat command again, to check how many CLOSE_WAIT connections 
there are now

(*) someone else here should be able to contribute the easiest way to achieve 
this






---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org

Reply via email to