Mark,

On 11/18/22 10:38, Mark Thomas wrote:
I've just found the comment I missed the first time.

This has already been implemented.

You need to set the following:

Connector
   bindOnInit="false"

Service
   gracefulStopAwaitMillis="20000"

This assumes you are using the default keep-alive timeout of 20s. If not, adjust gracefulStopAwaitMillis accordingly. It needs to be at least as long as the keep-alive timeout.

Should we cross-check these two configuration items and issue WARNING at startup if gracefulStopAwaitMillis < keepAliveTimeout?

-chris

On 18/11/2022 14:59, Mark Thomas wrote:
PRs are always welcome. From the description provided, I think more use of the existing graceful shutdown could be made.

I am also looking at whether a slightly broader refactoring is called for as there looks to be scope for better alignment between pause, stop and graceful stop.

Mark


On 18/11/2022 14:29, M. Thiim wrote:
Hi Mark

Great! I've actually made an experimental patch that fixes it, but it's
perhaps not completely clean. To get the behavior the waiting needs to
happen after the server has stopped accepting new connections, this seems to be the pause-phase. But the Http11Processoer rejects incoming requests during the pause phase (condition check in the while loop in the service()
method and also further down in the method), so I had to change this
behaviour as well (so it accepts requests but responds to them with
Connection: Close). I don't know if this breaks some assumptions that it
now continues to process requests in the pause phase. I also had to add a
.waitForConnectionDrain() to the Connector class and to the protocol
handler.

Also, in order to avoid waiting for the full duration of the
keepAliveTimeout in the cases when there's either no kept alive connections to begin with or where the server is able to close them earlier (due to new requests coming in that can be responded with Connection: Close, as would
typically be the case in a busy system) I had to have a count on current
number of connections. The getConnectionCount() in AbstractEndpoint.java
seemed useful but there's catch: Its counting is based on the value of the connectionLimitLatch and this is specifically nulled out during the pause phase (I suppose because it also has the function of limiting connections),
so the getConnectionCount() method returns -1 even if there are open
connections. So I added a separate AtomicInteger that tracks the number of
connections independently of the state and then changed
getConnectionCount() to use this in the situations where the
connectionLimitLatch is null.

I can clean up and share the patch, but let me know if you think it should
be done differently than described above. :-)

Br,  M. Thiim

Den fre. 18. nov. 2022 kl. 14.34 skrev Mark Thomas <ma...@apache.org>:

On 17/11/2022 19:39, M. Thiim wrote:
Hi,

We have observed that Tomcat doesn't gracefully close
keep-alive connections. Tomcat waits for already started requests to
complete, but once those are done, Tomcat will close all connections
immediately, irrespective of any configured keepAliveTimeout. This causes
problems for some HTTP clients, especially in Kubernetes-like
environments
when scaling down pods. Here, it can only work gracefully if the HTTP
client who falls victim to an unexpectedly closed connection retries on a
fresh connection, and it is not all clients that do this.

I would think that an entirely graceful shutdown sequence, in the
presence
of keep-alive connections, would work like the following:

1) Server receives shutdown request
2) Server immediately stops accepting new connections (already happens)
3) Server completes all requests already in  (already happens)
4) New behavior: If new requests come in on already established
keep-alive
connections those are processed, but a "Connection: close" is returned so
the client knows this connection can no longer be used. So at most one
more
request can be processed on each of those existing connections.
5) New behavior: When all keep-alive connections are gone, shutdown
proceeds. If there are still connections left after the keepAliveTimeout has passed, this means no requests can have been received on those during
the shutdown period (otherwise they would have been closed in #4). And
since Tomcat returned the keep-alive timeout value to the client when the
connection was setup, the client should know that the connection is no
longer usable. Therefore it is from this point safe for Tomcat to close
those remaining connections.
6) Rest of server shutdown continues

Seems a reasonable addition.

It looks like extending the behaviour when gracefulStopAwaitMillis is
set on the Service would work.

gracefulStopAwaitMillis would need to be greater than or equal to the
keep-alive timeout but we can document that as part of the patch.

Mark

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@tomcat.apache.org
For additional commands, e-mail: dev-h...@tomcat.apache.org




---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@tomcat.apache.org
For additional commands, e-mail: dev-h...@tomcat.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@tomcat.apache.org
For additional commands, e-mail: dev-h...@tomcat.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@tomcat.apache.org
For additional commands, e-mail: dev-h...@tomcat.apache.org

Reply via email to